11

I read the Wikipedia article on Hamming Weight and noticed something interesting:

It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in the string. In this binary case, it is also called the population count, popcount or sideways sum.

[emphasis mine]

So something occurred to me. Could I calculate the Hamming Distance between two strings by XORing them and then taking the Hamming Weight (POPCOUNT) of the resulting string?

Something along the lines of this (using gcc intrinsics):

#include <stdint.h>

int hammingDistance (uint64_t x, uint64_t y) {
        uint64_t res = x ^ y;
        return __builtin_popcountll (res);
}

Now, as for why I would want to do this, well, on some platforms, yes, this would just translate to gcc emitting a call to a function that calculates popcount. For instance, on x64 without popcnt, gcc spits out (Godbolt's GCC Online):

hammingDistance:
    sub rsp, 8
    xor rdi, rsi
    call    __popcountdi2
    add rsp, 8
    ret

OTOH, if you have a platform that supports POPCOUNT, like x64 models including nehalem and after (which have POPCNT), you get (Godbolt's GCC Online):

hammingDistance:
    xor rdi, rsi
    popcnt  rax, rdi
    ret

which should be waaay faster, especially once inlined.


But back to the original question. Can you take the Hamming Weight of the XOR of two strings to find their Hamming Distance? ie:

HD = HW (x xor y)
Community
  • 1
  • 1
haneefmubarak
  • 1,911
  • 1
  • 21
  • 32
  • Are you asking if the Hamming weight of the xor of two bitstrings is equal to their Hamming distance? (Answer : Yes, it trivially follows from the definition.) Or are you asking for a generalization of this efficient method to general strings? – Pradhan Aug 02 '14 at 20:17
  • I'm asking for both the first and for whether my implementation works too. – haneefmubarak Aug 02 '14 at 20:18
  • 3
    It is interesting to note that popcnt is not always the fastest solution. On the Intel Haswell processor, an AVX2 in-register lookup table method is faster. A utility that can test various population count methods is here: http://notabs.org/blcutil/. –  Aug 03 '14 at 02:56

2 Answers2

6

Hamming distance between two equal length strings, x and y, is defined to be the number of positions where they differ. In the case of x and y being bitstrings, x^y is a string with 1s in exactly the positions they differ. Thus, HammingDistance(x,y) = Number of 1s in x^y, for bitstrings. Also, HammingWeight(x) = number of 1s in x for a bitstring x. Thus, your first claim, HammingDistance(x,y) = HammingWeight(x^y) is true for bitstrings. Having established that, it is clear that your implementation is correct.

Pradhan
  • 16,391
  • 3
  • 44
  • 59
3

Yes, that works. For each bit, the bit is 1 if and only if the input bits are different. Hence, applied to a whole bit vector, the result has as many one bits (HW) as the inputs have differing bits (HD). And your code seems to exploit that relationship perfectly well. In fact, this shortcut is even mentioned further into the Hamming weight article you link to (Efficient implementation):

The Hamming distance of two words A and B can be calculated as the Hamming weight of A xor B.

  • Wow. I guess skimming doesn't work too well all the time then. I'll try and spend more time actually reading in the future. – haneefmubarak Aug 02 '14 at 20:52