Bericht
Autoren | Fabian Panse, Maurice van Keulen, Ander de Keijzer, Norbert Ritter |
Titel | Duplicate Detection in Probabilistic Data - Extended Version |
Verlag | Centre for Telematics and Information Technology, University of Twente |
Medium | CTIT technical report series Nummer TR-CTIT-09-44 |
Institution | Centre for Telematics and Information Technology (CTIT), University of Twente, Technical Report Series |
Datum | 2009 |
Seiten | 1-8 |
URL | https://research.utwente.nl/en/publications/duplicate-detection-in-probabilistic-data |
Notiz | Extended version of NTII2010 workshop paper |
Zusammenfassung | Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain (esp. probabilistic) source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. Furthermore, for increasing the efficiency of the duplicate detection process we introduce search space reduction methods adapted to probabilistic data. |
Dokument | |
Andere Formate | Din 1501 |
Debug Info for generation of "last modified"publications_646 (2021-10-12 00:30:24) | publication2person_646_209 (2021-10-12 00:30:24) | textFragments_en_1210 (2021-10-12 00:30:24) | persons_209 (2012-11-12 19:32:00) | persons_209 (2016-01-14 14:53:31) | persons_209 (2022-09-11 14:26:08) | publication2person_646_1052 (2021-10-12 00:30:24) | persons_1052 (2007-11-01 13:06:00) | persons_1052 (2019-03-14 16:27:01) | persons_1052 (2022-11-07 15:36:18) | publication2person_646_1244 (2021-10-12 00:30:24) | persons_1244 (2010-01-12 17:04:00) | publication2person_646_1693 (2021-10-12 00:30:24) | persons_1693 (2013-10-24 13:03:00) | publication2project_646_48 (2021-10-12 00:31:34) | person2project_209_48 (2011-05-09 18:12:18) | projects_48 (2015-08-16 11:47:37) | persons_209 (2012-11-12 19:32:00) | persons_209 (2016-01-14 14:53:31) | persons_209 (2022-09-11 14:26:08) | person2project_1052_48 (2011-05-09 18:12:18) | projects_48 (2015-08-16 11:47:37) | persons_1052 (2007-11-01 13:06:00) | persons_1052 (2019-03-14 16:27:01) | persons_1052 (2022-11-07 15:36:18) | persons_1052 (2007-11-01 13:06:00) | persons_1052 (2019-03-14 16:27:01) | persons_1052 (2022-11-07 15:36:18)
Am 7. November 2022 um 15:36 von Dr. Fabian PanseCALL getCollectionFull('publications/lookpub','vsys',646,0)