Former Projects

Quality of Uncertain Data

Abstract

Many real-life applications, for example data integration, data extraction, risk-management or sensor systems, naturally produce uncertain data. One of the most important goals in these applications is to produce data of high quality. This leads to the following open questions:

What does high quality exactly mean with respect to uncertainty and impreciseness?
What metrics are most qualified for quantifying the quality with respect to these means?

Currently, the most quality metrics have been defined for appropriately scoring the fineness of certain data and hence only insufficiently capture what is intuitively the quality of uncertain data. As we think, for adequately scoring quality of uncertain data new metrics for existing quality criteria as well as new quality criteria themselves are required. Moreover, as one of the most important methods for improving quality, we consider the integration of uncertain data. In this context, we focus on three elementary questions:

How to efficiently and effectively detect duplicates, if data are not only unclean but also imprecise and uncertain?
How to combine the uncertain information given by multiple duplicates so that a tuple of higher quality results?
How can the expressive modeling power of uncertain data models be used to capture uncertainty coming up during the integration process?

The QloUD project aims to develop techniques for properly scoring the quality of uncertain data as well as to develop techniques for properly integrating uncertain data.

Publications of Project

2013		Fabian Panse, Maurice van Keulen, Norbert Ritter Indeterministic Handling of Uncertain Decisions in Deduplication In: ACM Journal of Data and Information Quality
2012		Fabian Panse, Wolfram Wingerath, Steffen Friedrich, Norbert Ritter Key-based Blocking of Duplicates in Entity-Independent Probabilistic Data In: The 17th International Conference on Information Quality
2011		Fabian Panse, Norbert Ritter Incorporating Domain Knowledge and User Expertise in Probabilistic Tuple Merging In: 5th International Conference on Scalable Uncertainty Management
		Fabian Panse, Norbert Ritter Relational Data Completeness in the Presence of Maybe-Tuples In: Ingénierie des Systèmes d'Information
2009		Fabian Panse, Maurice van Keulen, Ander de Keijzer, Norbert Ritter Duplicate Detection in Probabilistic Data - Extended Version In: Centre for Telematics and Information Technology (CTIT), University of Twente, Technical Report Series

7. November 2022 at 15:36 by Dr. Fabian Panse

2020		Bachelorarbeit of Daniel Kötter Entwicklung und Implementierung eines webbasierten Umfragesystems zur Erhebung realistischer fehlerbehafteter Duplikate Supervisors: Fabian Panse, Martin Poppinga
		Bachelorarbeit of Heiko Eckmann Konzeptionelle Entwicklung eines quizbasierten Systems zur Erhebung realistischer fehlerbehafteter Duplikate Supervisors: Fabian Panse, Martin Poppinga
		Bachelorarbeit of Johannes Bolduan Implementierung einer Certain-Query-Anfragebearbeitung basierend auf Consistent-Query-Answering-Algorithmen Supervisors: Fabian Panse, Felix Kiehn
2019		Master Thesis of David Zschocke Querying probabilistic databases with certain data applications Supervisors: Norbert Ritter, Fabian Panse
		Bachelorarbeit of Jan Synwoldt Generierung und Evaluierung probabilistischer Daten unter Verwendung des OCR-Tools Tesseract Supervisors: Fabian Panse, Mareike Schmidt
2018		Master Thesis of Jennifer Soltau Analyse und Klassifikation existierender Verfahren zur Kollektiven Duplikaterkennung Supervisors: Norbert Ritter, Fabian Panse
		Bachelorarbeit of Manuela Buchholz Anfragebearbeitung auf probabilistischen Datenbanken unter der Verwendung von materialisierten Welten Supervisors: Fabian Panse, Gabriel Orsini
		Master Thesis of Timm Holler Aggregate Queries On Indeterministic Deduplication Results Supervisors: Fabian Panse, Wolfgang Menzel
2016		Bachelorarbeit of Alexander Keck Faktor-basierte Approximierung der minimal und maximal möglichen Qualität eines unvollständigen Clusterings mittels genetischer Algorithmen Supervisors: Fabian Panse, Gabriel Orsini
2014		Master Thesis of Erik Meyer Erzeugung von probabilistischen und bestimmten Daten mittels des Datengenerierungswerkzeugs ProbGee Tutors: Wolfram Wingerath, Steffen Friedrich Supervisors: Norbert Ritter, Dirk Bade
2013		Bachelorarbeit of Kai Hildebrandt Ein an den Apriori-Algorithmus angelehntes Verfahren zum Finden wahrscheinlicher Duplikat-Cluster Tutor: Fabian Panse Supervisors: Norbert Ritter, Wolfgang Menzel
2012		Master Thesis of Steffen Friedrich, Wolfram Wingerath Evaluation of tuple matching methods on generated probabilistic data Tutor: Fabian Panse Supervisors: Norbert Ritter, Wolfgang Menzel
		Diploma Thesis of Lennart Helm Anpassung eines Nearest-Neighbor basierten Duplikaterkennungsverfahrens an das Konzept der probabilistischen Daten Tutor: Fabian Panse Supervisors: Norbert Ritter, Matthias Rarey
		Diploma Thesis of David Haasenleder Ein Umfragesystem zur Generierung realistischer probabilistischer Testdatenbasen Tutor: Fabian Panse Supervisors: Norbert Ritter, Guido Gryczan
2011		Bachelorarbeit of Lars Grote Entwurf und Implementierung eines adaptiven Frameworks zur Duplikatenerkennung in probabilistischen Daten Tutor: Fabian Panse Supervisors: Norbert Ritter, Axel Schmolitzky
2010		Bachelorarbeit of Steffen Friedrich, Wolfram Wingerath Search-space reduction techniques for duplicate detection in probabilistic data Tutor: Fabian Panse Supervisors: Norbert Ritter, Wolfgang Menzel