3

How to Calculate the Hamming Distance between two datasets of same points?Both the data sets look exactly the same.
http://postimg.org/image/u11qnsolh/

There are two datasets of same number of points.
total number of points -19


First data set has 3 clusters.
Cluster A has 4 points in it
Cluster B has 2 points in it
Cluster C has 4 points in it

Rest of the points are outside the cluster


Second data set has 3 clusters.
Cluster A has 8 points in it
Cluster B has 5 points in it
Cluster C has 6 points in it

Irfan
  • 25
  • 5

1 Answers1

1

First, let's index the points:

enter image description here

You are interested in the Hamming distance between two sets of sets:

L = { {1,2,3,4}, {5,6}, {7}, {8}, {9}, {10}, {11}, {12}, {13}, {14,15,17,18}, {16}, {19} }

R = { {1,2,3,4,5,6,7,8}, {9,10,11,12,13}, {14,15,16,17,18,19} }


Adapting from [ 1 ] (section 2), generalizing Hamming distance to two sets X,Y, the distance can be defined as:

enter image description here

Adapting from [ 2 ] (section 3.4), the union and the difference between two sets of sets can be defined as:

enter image description here

and

enter image description here

so in your case:

L ⋃ R = { {1,2,3,4,5,6,7,8}, {9,10,11,12,13}, {14,15,16,17,18,19} }

L - R = { {} }

R - L = { {5,6,7,8}, {1,2,3,4,7,8}, {1,2,3,4,5,6,8}, {1,2,3,4,5,6,7}, {10,11,12,13}, {9,11,12,13}, {9,10,12,13}, {9,10,11,13}, {9,10,11,12}, {16,19}, {14,15,17,18,19}, {14,15,16,17,18} }

(L-R) ⋃ (R-L) = { {}, {5,6,7,8}, {1,2,3,4,7,8}, {1,2,3,4,5,6,8}, {1,2,3,4,5,6,7}, {10,11,12,13}, {9,11,12,13}, {9,10,12,13}, {9,10,11,13}, {9,10,11,12}, {16,19}, {14,15,17,18,19}, {14,15,16,17,18} }

so

|(L-R) ⋃ (R-L)| = 13

and

|L ⋃ R| = 3

so d(L,R) = 13 / 3 = 4.333


[ 1 ] Generalizing Hamming Distance to Finite Sets to the purpose of classifying heterogeneous objects [Bezem, Keijzer, Volmac]

[ 2 ] Pattern Matching in Conceptual Models – A Formal Multi-Modelling Language Approach [Delfmann, Herwig, Lis, Stein]

Lior Kogan
  • 19,919
  • 6
  • 53
  • 85