2

I am trying to implement the Hamming distance in Python. Hamming distance is typically used to measure the distance between two codewords. The operation is simply performing exclusive OR. For example, if we have the codewords 10011101 and 10111110, then their exclusive OR would be 00100011, and the Hamming distance is said to be 1 + 1 + 1 = 3.

My code is as follows:

def hamming_distance(codeword1, codeword2):
    """Calculate the Hamming distance between two bit strings"""
    assert len(codeword1) == len(codeword2)
    x, y = int(codeword1, 2), int(codeword2, 2) # '2' specifies that we are reading a binary number
    count, z = 0, x^y
    while z:
        count += 1
        z &= z - 1
    return count

def checking_codewords(codewords, received_data):
    closestDistance = len(received_data) # set default/placeholder closest distance as the maximum possible distance.
    closestCodeword = received_data # default/placeholder closest codeword
    for i in codewords:
        if(hamming_distance(i, received_data) < closestDistance):
            closestCodeword = i
            closestDistance = hamming_distance(i, received_data)
    return closestCodeword

print(checking_codewords(['1010111101', '0101110101', '1110101110', '0000000110', '1100101001'], '0001000101'))

hamming_distance(codeword1, codeword2) takes the two input parameters codeword1 and codeword2 in the form of binary values and returns the Hamming distance between the two input codewords.

checking_codewords(codewords, received_data) should determine the correct codeword IFF there are any errors in received data (i.e., the output is the corrected codeword string). Although, as you can see, I haven't added the "IFF there are any errors in received data" part yet.

I just tested the checking_codewords function with a set of examples, and it seems to have worked correctly for all of them except one. When I use the set of codewords ['1010111101', '0101110101', '1110101110', '0000000110', '1100101001'] and the received data '0001000101' the output is 0101110101, which is apparently incorrect. Is there something wrong with my code, or is 0101110101 actually correct and there is something wrong with the example? Or was this just a case where there was no error in the received data, so my code missed it?

The Pointer
  • 2,226
  • 7
  • 22
  • 50
  • 1
    Did you try to trace through the operation of the code for that example input? E.g., what is the result you get for `hamming_distance('0101110101', '0001000101')`? Is that correct? How about for the other codewords? Now, does the loop in `checking_codewords` appear to be operating correctly? "Or was this just a case where there was no error in the received data, so my code missed it?" Well, shouldn't you *know* what the correct answer is supposed to be? Try working it out by hand. If you can't do that for a toy example, how can you expect to write correct code? – Karl Knechtel Mar 20 '21 at 07:50
  • 2
    Please also read https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ – Karl Knechtel Mar 20 '21 at 07:50
  • @KarlKnechtel Yes, I only saw your comment after I had already posted this. I am reading that now. – The Pointer Mar 20 '21 at 07:50
  • As a hint: the problem is in a wheel that you're reinventing. If you check the built-in methods for integers (`dir(int)`) you should see something that will make your life a lot easier. – Karl Knechtel Mar 20 '21 at 07:53
  • @KarlKnechtel Thanks for the tip. I'm guessing the "wheel" you're referring to is `hamming_distance`? – The Pointer Mar 20 '21 at 08:00
  • @ThePointer actually if we are just counting the number of 1s in binary format for checking hamming distance in your example, both `0101110101` and `0000000110` has a distance of 3. Since the `0101110101` this is the first item occurring in the list that comes out as result. If you move the other number before this in the input the out put will change. If the hamming distance is same is there any additional logic to decide which is closest? You can verify this by adding `print(codeword1, count)` before returning `count` in the `hamming_distance` function – Ritwik G Mar 20 '21 at 08:12
  • No, I'm referring specifically to the counting of the set bits (your `while z:` loop). – Karl Knechtel Mar 20 '21 at 09:09
  • @accdias But doesn't `count, z = 0, x^y` set `z` to `x^y` and `count` to `0`? – The Pointer Mar 22 '21 at 07:56

1 Answers1

4

For my point of view, is not clear why your algorithm transforms the initial string into an integer to do a bitwise difference.

I mean, after the assert the equal length you can simply compute the difference using the zip function:

sum([c1!=c2 for c1,c2 in zip(codeword1,codeword2)])

For sum function, python consider True==1, False==0.

Doing a little simplification on your code:

def hamming_distance(codeword1, codeword2):
    """Calculate the Hamming distance between two bit strings"""
    assert len(codeword1) == len(codeword2)
    return sum([c1!=c2 for c1,c2 in zip(codeword1,codeword2)])

def checking_codewords(codewords, received_data):
    min_dist, min_word =  min([(hamming_distance(i, received_data), received_data) for i in codewords])
    return min_word
    

print(checking_codewords(['1010111101', '0101110101', '1110101110', '0000000110', '1100101001'], '0001000101'))

Glauco
  • 1,385
  • 2
  • 10
  • 20