|
|
|
Experimental Test Data (Blocking) |
| |
 |
Experiment 1: Overall Comparison of Adaptation Strategies
| Movie Databases generated with standard data setting | (HSQL)
|
| DSC1 [zip] |
|
| DSC2 [zip] |
|
| DSC3 [zip] |
|
| DSC4 [zip] |
|
| DSC5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
|
| |
| |
Experiment 2: Robustness against a varying Key Design
| Movie Databases generated with standard data setting | (HSQL)
|
| DSC1 [zip] |
|
| DSC2 [zip] |
|
| DSC3 [zip] |
|
| DSC4 [zip] |
|
| DSC5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
|
| |
| |
Experiment 3: Robustness against a varying Setting of Blocking Parameters
| Movie Databases generated with standard data setting | (HSQL)
|
| DSC1 [zip] |
|
| DSC2 [zip] |
|
| DSC3 [zip] |
|
| DSC4 [zip] |
|
| DSC5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
|
| |
| |
Experiment 4: Robustness against a varying Dirtiness of the Source Data
| Movie Databases | (HSQL)
|
| DSH1 [zip] |
|
| DSH2 [zip] |
|
| DSH3 [zip] |
|
| DSH4 [zip] |
|
| DSH5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.72
|
| |
| |
| Movie Databases | (HSQL)
|
| DSF1 [zip] |
|
| DSF2 [zip] |
|
| DSF3 [zip] |
|
| DSF4 [zip] |
|
| DSF5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternative per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.77
|
| |
| |
| Movie Databases | (HSQL)
|
| DSE1 [zip] |
|
| DSE2 [zip] |
|
| DSE3 [zip] |
|
| DSE4 [zip] |
|
| DSE5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.81
|
| |
| |
| Movie Databases generated with standard data setting | (HSQL)
|
| DSC1 [zip] |
|
| DSC2 [zip] |
|
| DSC3 [zip] |
|
| DSC4 [zip] |
|
| DSC5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
| Movie Databases | (HSQL)
|
| DSA1 [zip] |
|
| DSA2 [zip] |
|
| DSA3 [zip] |
|
| DSA4 [zip] |
|
| DSA5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.9
|
| |
| |
| Movie Databases | (HSQL)
|
| DSI1 [zip] |
|
| DSI2 [zip] |
|
| DSI3 [zip] |
|
| DSI4 [zip] |
|
| DSI5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.933
|
| |
| |
|
| |
| |
Experiment 5: Robustness against a varying Data Uncertainty
| Movie Databases generated with standard data setting | (HSQL)
|
| DSC1 [zip] |
|
| DSC2 [zip] |
|
| DSC3 [zip] |
|
| DSC4 [zip] |
|
| DSC5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
| Movie Databases | (HSQL)
|
| DSC1_15A [zip] |
|
| DSC2_15A [zip] |
|
| DSC3_15A [zip] |
|
| DSC4_15A [zip] |
|
| DSC5_15A [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 812,272
|
| Maximal Number of Alternatives per X-Tuple: | 15
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
| Movie Databases | (HSQL)
|
| DSC1_20A [zip] |
|
| DSC2_20A [zip] |
|
| DSC3_20A [zip] |
|
| DSC4_20A [zip] |
|
| DSC5_20A [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 1,057,191
|
| Maximal Number of Alternatives per X-Tuple: | 20
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
| Movie Databases | (HSQL)
|
| DSC1_25A [zip] |
|
| DSC2_25A [zip] |
|
| DSC3_25A [zip] |
|
| DSC4_25A [zip] |
|
| DSC5_25A [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 1,301,207
|
| Maximal Number of Alternatives per X-Tuple: | 25
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
|
| |
| |
Experiment 6: Uncertain Keys First
| Movie Databases generated with standard data setting | (HSQL)
|
| DSC1 [zip] |
|
| DSC2 [zip] |
|
| DSC3 [zip] |
|
| DSC4 [zip] |
|
| DSC5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
| Movie Databases | (HSQL)
|
| DSC1_15A [zip] |
|
| DSC2_15A [zip] |
|
| DSC3_15A [zip] |
|
| DSC4_15A [zip] |
|
| DSC5_15A [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 812,272
|
| Maximal Number of Alternatives per X-Tuple: | 15
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
| Movie Databases | (HSQL)
|
| DSC1_20A [zip] |
|
| DSC2_20A [zip] |
|
| DSC3_20A [zip] |
|
| DSC4_20A [zip] |
|
| DSC5_20A [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 1,057,191
|
| Maximal Number of Alternatives per X-Tuple: | 20
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
| Movie Databases | (HSQL)
|
| DSC1_25A [zip] |
|
| DSC2_25A [zip] |
|
| DSC3_25A [zip] |
|
| DSC4_25A [zip] |
|
| DSC5_25A [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 1,301,207
|
| Maximal Number of Alternatives per X-Tuple: | 25
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
|
| |
| |
Experiment 7: Overall Comparison using Different Blocking Techniques
| Movie Databases | (HSQL)
|
| DSH1 [zip] |
|
| DSH2 [zip] |
|
| DSH3 [zip] |
|
| DSH4 [zip] |
|
| DSH5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.72
|
| |
| |
| Movie Databases | (HSQL)
|
| DSF1 [zip] |
|
| DSF2 [zip] |
|
| DSF3 [zip] |
|
| DSF4 [zip] |
|
| DSF5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternative per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.77
|
| |
| |
| Movie Databases | (HSQL)
|
| DSE1 [zip] |
|
| DSE2 [zip] |
|
| DSE3 [zip] |
|
| DSE4 [zip] |
|
| DSE5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.81
|
| |
| |
| Movie Databases generated with standard data setting | (HSQL)
|
| DSC1 [zip] |
|
| DSC2 [zip] |
|
| DSC3 [zip] |
|
| DSC4 [zip] |
|
| DSC5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
| |
| |
| Movie Databases | (HSQL)
|
| DSA1 [zip] |
|
| DSA2 [zip] |
|
| DSA3 [zip] |
|
| DSA4 [zip] |
|
| DSA5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.9
|
| |
| |
| Movie Databases | (HSQL)
|
| DSI1 [zip] |
|
| DSI2 [zip] |
|
| DSI3 [zip] |
|
| DSI4 [zip] |
|
| DSI5 [zip] |
|
| Characteristics: |
|
| Number of X-Tuples: | 102,692
|
| Total Number of Alternatives: | 561,025
|
| Maximal Number of Alternatives per X-Tuple: | 10
|
| Number of Duplicates: | 4,380
|
| Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
| Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.933
|
| |
| |
|
| |
| |
| | | | | | |