|
|
Experimental Test Data (Derivation) |
| |
|
Experiment 1: Robustness against a varying Parameter Setting
Movie Databases generated with standard data setting | (HSQL)
|
DSC1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
|
|
|
Experiment 2: Robustness against a varying Dirtiness of the Source Data
Movie Databases | (HSQL)
|
DSH1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.72
|
|
|
Movie Databases | (HSQL)
|
DSF1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternative per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.77
|
|
|
Movie Databases | (HSQL)
|
DSE1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.81
|
|
|
Movie Databases generated with standard data setting | (HSQL)
|
DSC1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
Movie Databases | (HSQL)
|
DSA1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.9
|
|
|
Movie Databases | (HSQL)
|
DSI1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.933
|
|
|
|
|
|
Experiment 3: Robustness against a varying Data Uncertainty
Movie Databases generated with standard data setting | (HSQL)
|
DSC1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
Movie Databases | (HSQL)
|
DSC1_15A [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 812,272
|
Maximal Number of Alternatives per X-Tuple: | 15
|
Average Number of Alternatives per X-Tuple: | 7.91
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
Movie Databases | (HSQL)
|
DSC1_20A [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 1,057,191
|
Maximal Number of Alternatives per X-Tuple: | 20
|
Average Number of Alternatives per X-Tuple: | 10.29
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
Movie Databases | (HSQL)
|
DSC1_25A [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 1,301,207
|
Maximal Number of Alternatives per X-Tuple: | 25
|
Average Number of Alternatives per X-Tuple: | 12.67
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
|
|
|
Experiment 4: Efficiency Improvement w.r.t. a varying Dirtiness of the Source Data
Movie Databases | (HSQL)
|
DSH1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.72
|
|
|
Movie Databases | (HSQL)
|
DSF1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternative per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.77
|
|
|
Movie Databases | (HSQL)
|
DSE1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.81
|
|
|
Movie Databases generated with standard data setting | (HSQL)
|
DSC1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
Movie Databases | (HSQL)
|
DSA1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.9
|
|
|
Movie Databases | (HSQL)
|
DSI1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.933
|
|
|
|
|
|
Experiment 5: Efficiency Improvement w.r.t. a varying Data Uncertainty
Movie Databases generated with standard data setting | (HSQL)
|
DSC1 [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 561,025
|
Maximal Number of Alternatives per X-Tuple: | 10
|
Average Number of Alternatives per X-Tuple: | 5.46
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
Movie Databases | (HSQL)
|
DSC1_15A [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 812,272
|
Maximal Number of Alternatives per X-Tuple: | 15
|
Average Number of Alternatives per X-Tuple: | 7.91
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
Movie Databases | (HSQL)
|
DSC1_20A [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 1,057,191
|
Maximal Number of Alternatives per X-Tuple: | 20
|
Average Number of Alternatives per X-Tuple: | 10.29
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
Movie Databases | (HSQL)
|
DSC1_25A [zip] |
|
Characteristics: |
|
Number of X-Tuples: | 102,692
|
Total Number of Alternatives: | 1,301,207
|
Maximal Number of Alternatives per X-Tuple: | 25
|
Average Number of Alternatives per X-Tuple: | 12.67
|
Number of Duplicates: | 4,380
|
Distribution of Clustersizes (clustersize,frequence): | 2,1560; 3,232; 4,72; 5,31; 6,15; 7,11; 8,9; 9,5; 10,3; 11,2; 12,1; 13,1; 15,1
|
Average Similarity of True Duplicates (scored with Monge-Elkan distance): | 0.856
|
|
|
|
|
| | | | |