I'm calculating pairwise hamming distance between rows in data frame which contains strings. Some of the cells contain two values.
I would like to calculate hamming distance as follows:
if row1 = "d,e" and row2 = "d", Hd(row1, row2) is
Hd("d", "d") + Hd("e", "d") / length(row1) + length(row2)
0 + 1 / 2 + 1 = 1/3 = 0.33
or
if row1 = "d,e", and row2 = "d,f", Hd(row1, row2) is
Hd("d", "d") + Hd("d", "f") + Hd("e", "d") + Hd("e", "f") / length(row1) + length(row2)
0 + 1 + 1 + 1 / 2 + 2 = 3/4 = 0.75
I've managed to calculate hamming distance between cells that contain only one value, but stuck with those that contain more than one value.