Sie sind hier: Startseite Publikationen Bachelor- und Masterprojekte

Bachelor- und Masterprojekte

Wir haben ständig neue Themen für Projekte aus den Bereichen unserer aktuellen Forschung. Kontaktieren Sie uns.

Project Map Matching Mobile Phones to Public Transit Vehicles


Gerrit Freiwald, Robin Wu,SS 2022, Projekt Homepage

Map matching can be used to match a given sequence of GPS points to a digital model of the real world. ‘Traditional’ map matching, like navigation systems for cars, uses a static map for the matching. In contrast, when working with a public transit vehicle network, the ‘map’ contains the positions of each vehicle, which are highly dynamic.

Segmentation of layout-based documents

Elias Kempf, WS 2021, Projekt Homepage

PDF is a widely used file format and in most cases very convenient to use for representing text. However, PDF is layout-based, i.e., text is only saved character by character and not even necessarily in the right order. This makes tasks like keyword search or text extraction pretty difficult. The goal of this project is to detect individual words, text blocks, and the reading order of a PDF document to allow for reconstruction of plain text.

Spelling Correction and Autocompletion for Mobile Devices

Ziang Lu, SS 2021, Projekt Homepage

A virtual keyboard is a powerful tool for smartphones, with which users can improve the quality and efficiency of the input. In this project, we will explore how to use n-gram models to develop an Android keyboard which gives accurate corrections and completions efficiently.


Sentence Segmentation

Krisztina Agoston, SS2021, Projekt Homepage

Sentence segmentation is a basic part of many natural language processing (NLP) tasks. The leading NLP Python libraries spaCy and NLTK offer pre-trained models for that. These models often fail on a specific domain. The goal of this project is to measure the performance of these libraries and compare their results with a custom made LSTM model on special domains like Wikipedia or arXiv.


Tokenization repair using Transformers

Sebastian Walter, WS2020/21, Projekt Homepage

This project tackles the tokenization repair problem using the Transformer neural network architecture. We achieve results that match the performance of previous work on multiple tokenization repair benchmarks paired with usable runtimes in practice.


Circular Transit Maps

Jonathan Hauser, SS 2019, Projekt Homepage

Transit Maps can be found in many places. By replacing the actual road geometry with a simpler geometry like arcs the result not only becomes more aesthetically pleasing but also more readable. Nonetheless, the original road layout shouldn’t be left completely unconsidered to avoid confusion when reading the map. 


UniPal (A chatbot for the course catalogue of the Uni Freiburg)

Pascal Muckenhirn und Tanyu Tanev, SS 2019, Projekt Homepage

UniPal is a chatbot design for simplify the finding of information about your courses and tutorials. Instead of crawling the web, like students normally do to get information, you can now just ask UniPal and it'll fetch the information for you - just like any of your best pals. If you're interested in our project, be free to check out our project homepage - you can directly contant UniPal on it!



Natalie Prange, WS 2018, Projekt Homepage

Systems based on machine learning such as Question Answering or Question Completion systems require large question datasets for training. However, large question datasets that are not restricted to a specific kind of question are hard to find. The WebQuestions (Berant et al., 2013) and Free917 (Cai & Yates, 2013) datasets both contain less than 10,000 questions. The SimpleQuestions dataset (Bordes et al., 2015) contains 108,442 questions, but the questions are limited to simple questions over Freebase triples of the form (subject, relationship, object). The 30M Factoid Question-Answer corpus (Serban et al., 2016) contains 30 million questions, however, these questions, too, are limited to simple Freebase triple questions.

We introduce a question dataset containing 4,390,597 questions and corresponding answer entities that are generated by rephrasing Wikipedia sentences as questions. The rough pipeline of the question generation (QG) system is as follows: A Wikipedia dump with Freebase entity mentions is preprocessed by annotating entities with their types. The preprocessed Wikipedia dump is then parsed using a dependency parser. For each sentence, entities that fulfill certain grammatical criteria are selected as answer entities. A fitting WH-word is selected for an answer entity and various transformations are performed over the sentence to rephrase it as a question. Finally, the generated questions are filtered to avoid ungrammatical or otherwise unreasonable questions. The following section describes the system pipeline in more detail. 


Complete Search UI

Olivier Puraye, WS 2018, Projekt Homepage

  • Make any table structured data file searchable using the different features of Completesearch

  • Automatic detection of separators
    Validate entire file and output syntax error with line count

  • Analyse input file and determine suitable parameters for each of its columns

  • Build a nice and easy-to-use web app

  • Make all search engine settings adjustable via the web app


Concept Neurons

Joao Carvalho, WS 2018, Projekt Homepage

This webpage showcases the master project developed by João Carvalho at the Chair of Algorithms and Data Structures of the University of Freiburg, as part of the MSc degree in Computer Science.

In this project we explored the capabilities of neural language models. More precisely, we questioned if a neural network would be able to encode Part-of-Speech (POS) tags in its neurons, just by training a simple language model.

We first trained a byte-level language model with a Long Short-Term Memory (LSTM) network using a large collection of text. Then, taking a sentence, which can be viewed as a byte sequence, we used its inner representations (the cell states of the LSTM), along with its corresponding POS tags, as the inputs and targets to train a logistic regression classifier. Looking at the classifier weights, we observed that some concepts (POS tags) are encoded in one neuron, i.e., the POS tag of a byte can be derived from one neuron's activation value, while others are derived with more than one neuron together with the logistic regression classifier. For some tags, using three neurons yielded satisfactory results.

The idea for this project started from the openAI paper (Radford et al, 2017). In this article, the authors found a dimension in the cell states (a neuron) that strongly correlates to the semantic concept of sentiment, which they called the Sentiment Neuron. In this project we also replicated their results.


Football Data Extraction for Broccoli

Jonas Bischofberger, WS 2018, Projekt Homepage

The Broccoli search engine answers queries about a broad range of entities, but lacks information in more specific domains. The task was to choose an appropriate one of these domains, obtain relational and full-text data from that domain and integrate it into the current Broccoli version. For this project, data about association football players (e.g. height, birth date, current team) and teams (e.g. date of foundation) was chosen.


Search Engine for OSM Data

Iradj Solouk, WS 2018, Projekt Homepage

The actual motivation behind this project is to build an OSM-Data search engine that has comparable results as Nominatim. The aim of this project is setting up a basic web application(Parser, index structures, basic ranking and the UI) so that the application can be improved in a future work, in which one will mainly focus on the ranking functionality. In the following, the structure of the application, certain parts of it and their development will be presented. The order of presentation also reflects the order of development of the application and the logical execution sequence.


Tabular Information Extraction

Tobias Matysiak, SS 2018, Projekt Homepage

This project aims at simplifying the creation of SPARQL-queries for the knowledge base Freebase. Instead of finding out the relevant Freebase types and relations by hand, the user specifies table columns in a simple table description format.



Mohamed Abou-Hussein, Omar Shehata, WS 2017/18, Projekt Homepage

The General Transit Feed Specification (GTFS) is a group of files that defines a common format for public transportation schedules and other geographic information. GTFS allows public agencies to publish their data to be used in the feed. The goal of the project was to build a tool (GTFS mapper) that given a GTFS feed would generate an alternative feed with the same formate and describing the same region, but using open source data. This is due to the fact that the data uploaded from the public agencies is not always sufficient.

Open Street Map (OSM) is an open source project that shows the maps of the world. It is built by volunteering users from all over the world. GTFS mapper is a tool given a GTFS feed for a certain city and the osm data representing the same city would output a new feed with the same format of GTFS, but is produced using information and coordinates from the OSM data.


Suche mit regulären Ausdrücken

Christian Reitter, WS 2017/18, Projekt Homepage

This project is evaluating the design of a "search as you type" interface for special regular expressions on a large set of scanning data.

The extensive and complex database of perl compatible regular expressions of the Nmap project was chosen as a practical application for this search. Its database consists of about 11300 regular expressions that represent the bulk knowledge about varying binary response characteristics of a large number of network protocols and software implementations, allowing practical fingerprinting on various levels of software, vendor and devices. An additional meta-data syntax provided by the nmap probe format augmented with standardized Common Platform Enumeration (CPE) identifiers allows for the categorization of additional information, including the ability to extract target-specific information with the help of capturing groups within the regular expressions.



Julian Bürklin, Daniel Kemen, SS 2017, Projekt Homepage

QLever ist a full-featured SPARQL+Text engine which returns result tuples for given SPARQL queries. The old UI is very simplistic and hard to use. The goal of our project is to create a simple, powerful and intuitive UI for QLever which supports suggestions and auto-completions.



Evgeny Anatskiy, SS 2017, Projekt Homepage

The main goal was to create an easy-to-use web application, which would take a dataset (CSV, TSV), automatically determine a separator, validate the data (remove rows with a wrong syntax), define search facets, and save the file (pre-processing). Then, use the saved file as an input for the search engine CompleteSearch, which generates indices (post-processing).
CompleteSearch does all the work on performing the search in the uploaded dataset. The web application (this project) serves as a middle layer, which processes and corrects the user input, and sends it to a separate local CompleteSearch server.


Pseudo Database for Existing XML Files

Nishat Fariha, SS 2017, Projekt Homepage

The simulation tool SmartCalc.CTM (Fraunhofer ISE intern development) uses a set of XML files as its data source. These files contain data about Material properties of components of a photovoltaic module. The choice for XML files instead of a SQL database was motivated by the human readability and the possibility to give new files for new measured material properties to customers, who also run the software.



Julian Löffler, Rezart Quelibari, Matthias Urban, SS 2016, Projekt Homepage

The goal of this project is to set up a web app, where one can upload any CSV dataset, and then have a convenient search (with meaningful default settings), without having to set up anything oneself.



Louis Retter, Frank Gelhausen, SS 2016, Projekt Homepage

This project consisted of getting to know Deepdive aswell as finding out how well it performs and finding possible use cases. Deepdive is a data-managament system which can extract entities from a given text and predicts the probablility of entities engaged in a given relation using machine-learning. These predictions are based on user-defined features, not algorithms, which is the main advantage of using Deepdive over other systems.


Efficient Code for (De)Compression

Zhiwei Zhang, WS 2015/2016, Projekt Homepage

In this project I implemented the Elias-Gamma algorithm, the Elias-Delta algorithm, the Golomb algorithm, the Variable-Bytes algorithm and the mainly optimized decompression function of the Simple8b algorithm and at the same time compared them to find out the "best appropriate algorithm(s)" for the search engine Broccoli


Lexical Semantics

Max Lotstein SS 2015, Projekt Homepage

This project models the meanings of words based on how those words are used in a large corpus.The meaning-representation supports both equality and distance comparisons, as well as other operations from linear algebra. Some of these operations can be used for semantic comparisons and can thus be used for detection of related, and perhaps even synonymous, word pairs. Various tests of correspondence between human judgments of semantic similarity and the project’s output place it among similar systems, though not at the top..


A Mobile App for Kitchen Account Management

Simon Schonhart, Christian Reichenbach, WS 2014/2015, Projekt Homepage

The goal of this project was to implement an Android app, that can be used to manage the kitchen accounts of our staff. The app is able to manage the consumed products of a user and to debit the employee's account with the corresponding price. Moreover, the app can be used to send payment reminders to the users at regular intervals.


Automatic Recognition of Values in Wikepedia Articles

Regina König, WS 2014/2015, Projekt Homepage

The goal was to find automatically values in wikipedia articles and convert them into metric units for the semantic search engine broccoli. The value finding component runs in a chain of a UIMA Pipeline.


OSM Search

Tobias Faas, SS 2014, Projekt Homepage

Das System verwendet die Daten des OpenStreetMap-Projektes in Form einer osm.pbf-Datei ("Protocolbuffer Binary Format"). Diese beinhaltet die OSM-Entities in Binär-Form was einen schnelleren Zugriff erlaubt. Zusätzlich werden die Boundaries deutscher Städte und Landkreise aus einer Textdatei eingelesen.
Die osm.pbf Datei konnte mit Hilfe der Osmosis-library eingelesen werden. Entities, mit Tags (wie zum Beispiel: "shop=bakery"), welche in der zuvor erstellten Ontology File definiert wurden, werden herausgefiltert und in einem Index abgespeichert.


Entity-Component System

Jochen Kempfle, Markus Reher, WS 2013/2014, Projekt Homepage

The provided Entity-Component-System was implemented as the so called ESE-Project at the Chair of Algorithms and Data Structures at the University of Freiburg.



Ragavan Natarajan, SS 2013, Projekt Homepage

Wikification is the process of identifying the important phrases in a document and linking each of them to appropriate articles on Wikipedia based on their context of occurrences. The important phrases in a document are also called keyphrases, somewhat similar to the term keyword, but unlike a keyword , a keyphrase can consist of one or more words. A wikifier is a software that performs the wikification of a document. One such software has been developed and made available here. This page discusses how it was developed from the ground-up. A more detailed report is available here.


Relation Extraction

Anton Stepan, Marius Bethge, 2012/2013, Projekt Homepage

This project dealt with improving the location-based data found in the YAGO knowledge base, which is used by the the semantic full-text search engine Broccoli.

In order to solve this task we used the data provided by the GeoNames geographical database and composed the program GeoReader which extracts the relevant information and creates valid relation files in the format used by Broccoli.


Manual Feature Engineering with 3D Motion Capture Data

Benjamin Meier, Maria Hügle, WS 2012/2013, Projekt Homepage

This project is about manual feature engineering with 3D motion data recorded by the 3D kinematics measurement system Xsens MVN system.


Spider Data Projector Control

Rainer Querfurth, Josua Scherzinger, WS 2012/2013, Projekt Homepage

Das Spider VPC Projekt ist eine bereits bestehende Software zur Steuerung der Projektoren an der technischen Fakultät der Universität Freiburg. Hierbei handelt es sich im Besonderen um ein zentral nutzbares Monitoring Tool, einen XML Creator zum Erstellen benötigter XMLSettings Dateien, sowie dem eigentlichen Spider VPC Programm, welches auf den Steuergeräten selbst aufgespielt ist.

Das Spider VPC Programm nutzt das .NET Micro Framework von Microsoft. Hierbei handelt es sich um die teilweise Portierung der .NET Bibliotheken in die Welt der Microprozessoren. Daher wurde zur Programmierung u.a. Visual Studio von Microsoft genutzt. Bei den verbauten Steuergeräten handelt es sich um GHI Spider Kits.

Wir führen dieses Projekt als zweite Gruppe fort, wobei wir uns dabei auf die Erweiterung des Monitoring Tools, die Erweiterung der Steueroptionen der Spider VPC, sowie die Wartung aller verbauten Spider konzentriert haben. Auf dieser Seite möchten wir kurz unseren Teil des Projektes vorstellen.


Multicriteria Multimodal Routeplanning

RobinTibor Schirrmeister, Simon Skilevic WS 2012, Projekt Homepage

The goal was to write a program that can perform shortest path queries in a hybrid network of public transportation lines and roads. These queries should use both the time of the path as well as the amount of transfers as criteria, thus "multi-criteria".
The program should also be able to create some basic visualization of the Dijkstra computation.


Transfer-pattern robustness

Eugen Sawin, Philip Stahl, Jonas Sternisko SS 2012, Projekt Homepage

This project extends to the development of an efficient route planner for public transportation networks, which is used to conduct experiments on the reliability of transfer patterns on dynamically modified networks. 

Entitätserkennung für semantische Volltextsuche in medizinischen Fachartikeln

Jan Kelch, Masterprojekt, WS 2011/2012, Projekt Homepage

Die Grundlage dieses Projektes ist eine Sammlung von medizinischen Fachartikeln (ZBmed). ZBmed umfasst über 1.000.000 Artikel unterschiedlicher Journale. Das Ziel des Projektes war, die Texte der Fachartikel für die semantische Suchmaschine Broccoli aufzubereiten. Dafür müssen bestimmte Entitäten, welche von Broccoli berücksichtigt werden sollen, in den Texten markiert werden.


Transit Routing

Mirko Brodesser, Dirk Kienle, Thomas Liebetraut, Kyanoush Seyed Yahosseini, SS 2011, Projekt Homepage

While it is quite easy to find paths in street networks, incorporating transit data can be quite challenging. There are bus stops that have to be considered, but the bus does not stop all the time, so it may be better to walk. The right bus stops have to be found to use the right bus or metro and when changing trains, the algorithm should wait for an appropriate period of time so that the user does not miss the bus.


Implementation of a new algorithm for the fast intersection of unions of sorted lists

Zhongjie Cai, WS 2010/2011, Projekt Homepage

This master project is intended to implement a newly proposed fast intersection of unions of sorted lists using forward lists, presented in Efficient interactive Fuzzy Keyword Search1, www 2009 Madrid Conference.


RNA - Google

Tuti Andriani, Thilu Chang, WS 2010/2011, Projekt Homepage

This master project is intended to implement a prototype for fast reach in laerge RNA repositories with three different algorithms which have different search goals.


Implementation of basic RNA structure prediction algorithms

Li Zhang, SS 2010, Projekt Homepage

In this project i implement two bioinformation algorithms for the RNA secondary structure prediction. One is Nussinov Algorithm and another one is Zuker Algorithm. The basic idea of Nussinov Algorithm is try to calculate the maximum based RNA pairs(A-U, C-G, U-G) within a given RNA string, and then use the maximum based pairs to traceback, and get the corresponding best RNA secondary structure. The basic idea of Zuker Algorithm is to calculate the minimum Gibbs Free Energy within a given RNA string, and use the minimum energy to traceback, and get the corresponding best RNA secondary structure.



Axel Lehmann und Jens Hoffmann, Bachelorprojekt, 2010, Projekt Homepage

Daphne ist ein Online Verwaltungs- und Informationssystem für Kurse an Universitäten. Das System wurde als Datenbankanwendung auf Grundlage des Python Webframeworks Django implementiert.



Ico Chichkov, Claudius Korzen, Eric Lacher, Bachelorprojekt, 2010, Projekt Homepage

PDFLibraryGui (GWT)

Das grafische Benutzerinterface ist mit dem Google Web Toolkit realisiert. Alle anderen Komponenten werden von diesem konnektiert. PDFLibraryGui stellt die gesamte Weboberfläche dar. Es bedient sich der anderen Komponenten um in indizierten Papern zu suchen, Treffer zu finden und Titel einem DBLP Eintrag zuzuorden.

InvertedIndex (Java)

Der InvertedIndex ist ein in Java geschriebenes Programm zur Indizierung der dblp.xml. Diese Komponente wird genutzt um bekannte Titel mit einem dblp Eintrag zu matchen, oder einen DBLP Eintrag für eine Zitierung zu finden.


Completesearch ist eine unter der Leitung von Frau Prof. Dr. Bast entwickelte universelle Suchmaschine. In diesem Projekt übernimmt sie die Suche in der DBLP Datenbank sowie in den nachträglich eingefügten Papern.


Movie Organizer

Mirko Brodesser, Bachelorprojekt 2010, Projekt Homepage

This site provides information about a Bachelor-Project which is called "MovieOrganizer".
It was written at the University of Freiburg by Mirko Brodesser and supervised by Prof. Hannah Bast.
It was developed from March-April 2010, as a parttime-project.
The goal was to write a program which allows the user to have a good overview over his movies
and to give him the possibility to find a subset of movies after his own criteria.



Johannes Schwenk, Diplomprojekt 2010, Projekt Homepage

Diese Arbeit präsentiert die Implementierung einer neuen Suchfunktion für die Webseiten der Universität Freiburg unter Verwendung der Software CompleteSearch als Backend. Dabei wird ein Plugin-System zum einfachen und schnellen Einbinden neuer Quellen in Python entwickelt, welches strukturierte XML-Daten aggregiert. Es wird ein generisches Plugin für Plone-Inhalte vorgestellt, welches das Hinzufügen weiterer Plone-Portale zum Suchindex zusätzlich vereinfacht. Zusammen mit weiteren Plugins für nicht Plone-basierte Quellen wird quellenübergreifende Suche ermöglicht.

Benutzerspezifische Werkzeuge