icon html

QloUD - Quality of Uncertain Data

QloUD is a project resulting from a cooperation of the database groups of the University of Hamburg (Germany) and the University of Twente (The Netherlands).

Motivation

Many real-life applications, for example data integration, data extraction, risk-management or sensor systems, naturally produce uncertain data. One of the most important goals in these applications is to produce data of high quality. This leads to the following open questions:
  • What does high quality exactly mean with respect to uncertainty and impreciseness?
  • What metrics are most qualified for quantifying the quality with respect to these means?
Currently, the most quality metrics have been defined for appropriately scoring the fineness of certain data and hence only insufficiently capture what is intuitively the quality of uncertain data. As we think, for adequately scoring quality of uncertain data new metrics for existing quality criteria as well as new quality criteria themselves are required.

Moreover, as one of the most important methods for improving quality, we consider the integration of uncertain data. In this context, we focus on three elementary questions:

  • How to efficiently and effectively detect duplicates, if data are not only unclean but also imprecise and uncertain?
  • How to combine the uncertain information given by multiple duplicates so that a tuple of higher quality results?
  • How can the expressive modeling power of uncertain data models be used to capture uncertainty coming up during the integration process?
The QloUD project aims to develop techniques for properly scoring the quality of uncertain data as well as to develop techniques for properly integrating uncertain data.

News


 

First information on our probabilistic data generator ProbDataGen is now available

 

People


 

Fabian Panse (University of Hamburg)
Norbert Ritter (University of Hamburg)
Maurice van Keulen(University of Twente)
Ander de Keijzer(University of Twente)

 [ more ]

 

Publications


 

Key-based Blocking of Duplicates in Entity-Independent Probabilistic Data (ICIQ 2012)
Evaluating Indeterministic Duplicate Detection Results (SUM 2012)
Indeterministic Handling of Uncertain Decisions in Deduplication (JDIQ)

 [ more ]

 

Downloads