0

I'm calculating pairwise hamming distance between rows in data frame which contains strings. Some of the cells contain two values.

I would like to calculate hamming distance as follows:

if row1 = "d,e" and row2 = "d", Hd(row1, row2) is
Hd("d", "d") + Hd("e", "d") / length(row1) + length(row2)
0 + 1 / 2 + 1 = 1/3 = 0.33

or

if row1 = "d,e", and row2 = "d,f", Hd(row1, row2) is
Hd("d", "d") + Hd("d", "f") + Hd("e", "d") + Hd("e", "f") / length(row1) + length(row2)
0 + 1 + 1 + 1 / 2 + 2 = 3/4 = 0.75

I've managed to calculate hamming distance between cells that contain only one value, but stuck with those that contain more than one value.

Phil
  • 7,287
  • 3
  • 36
  • 66
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Apr 30 '23 at 21:19

0 Answers0