Abgeschlossene Projekte

Quality of Uncertain Data

Projektbeschreibung

Many real-life applications, for example data integration, data extraction, risk-management or sensor systems, naturally produce uncertain data. One of the most important goals in these applications is to produce data of high quality. This leads to the following open questions:

What does high quality exactly mean with respect to uncertainty and impreciseness?
What metrics are most qualified for quantifying the quality with respect to these means?

Currently, the most quality metrics have been defined for appropriately scoring the fineness of certain data and hence only insufficiently capture what is intuitively the quality of uncertain data. As we think, for adequately scoring quality of uncertain data new metrics for existing quality criteria as well as new quality criteria themselves are required. Moreover, as one of the most important methods for improving quality, we consider the integration of uncertain data. In this context, we focus on three elementary questions:

How to efficiently and effectively detect duplicates, if data are not only unclean but also imprecise and uncertain?
How to combine the uncertain information given by multiple duplicates so that a tuple of higher quality results?
How can the expressive modeling power of uncertain data models be used to capture uncertainty coming up during the integration process?

The QloUD project aims to develop techniques for properly scoring the quality of uncertain data as well as to develop techniques for properly integrating uncertain data.

Publikationen im Projekt

2013		Fabian Panse, Maurice van Keulen, Norbert Ritter Indeterministic Handling of Uncertain Decisions in Deduplication In: ACM Journal of Data and Information Quality
2012		Fabian Panse, Wolfram Wingerath, Steffen Friedrich, Norbert Ritter Key-based Blocking of Duplicates in Entity-Independent Probabilistic Data In: The 17th International Conference on Information Quality
2011		Fabian Panse, Norbert Ritter Incorporating Domain Knowledge and User Expertise in Probabilistic Tuple Merging In: 5th International Conference on Scalable Uncertainty Management
		Fabian Panse, Norbert Ritter Relational Data Completeness in the Presence of Maybe-Tuples In: Ingénierie des Systèmes d'Information
2009		Fabian Panse, Maurice van Keulen, Ander de Keijzer, Norbert Ritter Duplicate Detection in Probabilistic Data - Extended Version In: Centre for Telematics and Information Technology (CTIT), University of Twente, Technical Report Series

Am 7. November 2022 um 15:36 von Dr. Fabian Panse

2020		Bachelorarbeit von Daniel Kötter Entwicklung und Implementierung eines webbasierten Umfragesystems zur Erhebung realistischer fehlerbehafteter Duplikate Gutachter: Fabian Panse, Martin Poppinga
		Bachelorarbeit von Heiko Eckmann Konzeptionelle Entwicklung eines quizbasierten Systems zur Erhebung realistischer fehlerbehafteter Duplikate Gutachter: Fabian Panse, Martin Poppinga
		Bachelorarbeit von Johannes Bolduan Implementierung einer Certain-Query-Anfragebearbeitung basierend auf Consistent-Query-Answering-Algorithmen Gutachter: Fabian Panse, Felix Kiehn
2019		Masterarbeit von David Zschocke Querying probabilistic databases with certain data applications Gutachter: Norbert Ritter, Fabian Panse
		Bachelorarbeit von Jan Synwoldt Generierung und Evaluierung probabilistischer Daten unter Verwendung des OCR-Tools Tesseract Gutachter: Fabian Panse, Mareike Schmidt
2018		Masterarbeit von Jennifer Soltau Analyse und Klassifikation existierender Verfahren zur Kollektiven Duplikaterkennung Gutachter: Norbert Ritter, Fabian Panse
		Bachelorarbeit von Manuela Buchholz Anfragebearbeitung auf probabilistischen Datenbanken unter der Verwendung von materialisierten Welten Gutachter: Fabian Panse, Gabriel Orsini
		Masterarbeit von Timm Holler Aggregate Queries On Indeterministic Deduplication Results Gutachter: Fabian Panse, Wolfgang Menzel
2016		Bachelorarbeit von Alexander Keck Faktor-basierte Approximierung der minimal und maximal möglichen Qualität eines unvollständigen Clusterings mittels genetischer Algorithmen Gutachter: Fabian Panse, Gabriel Orsini
2014		Masterarbeit von Erik Meyer Erzeugung von probabilistischen und bestimmten Daten mittels des Datengenerierungswerkzeugs ProbGee Betreuer: Wolfram Wingerath, Steffen Friedrich Gutachter: Norbert Ritter, Dirk Bade
2013		Bachelorarbeit von Kai Hildebrandt Ein an den Apriori-Algorithmus angelehntes Verfahren zum Finden wahrscheinlicher Duplikat-Cluster Betreuer: Fabian Panse Gutachter: Norbert Ritter, Wolfgang Menzel
2012		Masterarbeit von Steffen Friedrich, Wolfram Wingerath Evaluation of tuple matching methods on generated probabilistic data Betreuer: Fabian Panse Gutachter: Norbert Ritter, Wolfgang Menzel
		Diplomarbeit von Lennart Helm Anpassung eines Nearest-Neighbor basierten Duplikaterkennungsverfahrens an das Konzept der probabilistischen Daten Betreuer: Fabian Panse Gutachter: Norbert Ritter, Matthias Rarey
		Diplomarbeit von David Haasenleder Ein Umfragesystem zur Generierung realistischer probabilistischer Testdatenbasen Betreuer: Fabian Panse Gutachter: Norbert Ritter, Guido Gryczan
2011		Bachelorarbeit von Lars Grote Entwurf und Implementierung eines adaptiven Frameworks zur Duplikatenerkennung in probabilistischen Daten Betreuer: Fabian Panse Gutachter: Norbert Ritter, Axel Schmolitzky
2010		Bachelorarbeit von Steffen Friedrich, Wolfram Wingerath Search-space reduction techniques for duplicate detection in probabilistic data Betreuer: Fabian Panse Gutachter: Norbert Ritter, Wolfgang Menzel