|
|
|
QloUD - Quality of Uncertain Data
QloUD is a project resulting from a cooperation of the database groups of the
University of Hamburg (Germany) and
the University of Twente (The Netherlands).
Motivation
Many real-life applications, for example data integration, data extraction, risk-management or sensor systems,
naturally produce uncertain data.
One of the most important goals in these applications is to produce data of high quality.
This leads to the following open questions:
- What does high quality exactly mean with respect to uncertainty and impreciseness?
- What metrics are most qualified for quantifying the quality with respect to these means?
Currently, the most quality metrics have been defined for appropriately scoring the fineness of certain data
and hence only insufficiently capture what is intuitively
the quality of uncertain data.
As we think, for adequately scoring quality of uncertain data new metrics for existing quality criteria
as well as new quality criteria themselves are required.
Moreover, as one of the most important methods for improving quality, we consider the integration of uncertain data.
In this context, we focus on three elementary questions:
- How to efficiently and effectively detect duplicates, if data are not only unclean but also imprecise and uncertain?
- How to combine the uncertain information given by multiple duplicates so that a tuple of higher quality results?
- How can the expressive modeling power of uncertain data models be used to capture uncertainty coming up during the integration process?
The QloUD project aims to develop techniques for properly scoring the quality of uncertain data as well as
to develop techniques for properly integrating uncertain data.
|
|
|
News
First information on our probabilistic data generator ProbDataGen is now available |
|
Publications
Key-based Blocking of Duplicates in Entity-Independent Probabilistic Data (ICIQ 2012)
|
|
Evaluating Indeterministic Duplicate Detection Results (SUM 2012)
|
|
Indeterministic Handling of Uncertain Decisions in Deduplication (JDIQ)
|
|
[ more ]
|
|