1

Given two binary strings a and b, find the sum of the Hamming distances between a and all contiguous substrings of b of length |a|.

inputCopy:

01

00111

outputCopy:

3

Explanation: For the first sample case, there are four contiguous substrings of b of length |a|: "00", "01", "11", and "11". The distance between "01" and "00" is |0 - 0| + |1 - 0| = 1. The distance between "01" and "01" is |0 - 0| + |1 - 1| = 0. The distance between "01" and "11" is |0 - 1| + |1 - 1| = 1. Last distance counts twice, as there are two occurrences of string "11". The sum of these edit distances is 1 + 0 + 1 + 1 = 3.

In this question, i'm only thinking of a brute force solution with time complexity O(|a|.|b|) like a string matching algorithm... Is there any faster algo to do this problem

BlackPearl
  • 1,662
  • 1
  • 8
  • 16
RAHUL
  • 54
  • 5

2 Answers2

1

As you are computing the sum of Hamming distances, it can be done very fast:

H := sum of Hamming distances
compute array A and B such as the following:
    A[i]: the number of zeros up to i-th element of b
    B[i]: the number of ones up to i-th element of b

iterate over elements of a for i <- 0:|a|-1:
    as a[i] should be compared with all elemetns of b[i] .. b[|b|-|a|+i]
    its effect over the value of summing distances is:
        if a[i] == 0:
            H += B[|b|-|a|+i] - B[i-1]
        else: 
            H += A[|b|-|a|-1] - A[i-1]

In the above pesudocode B[|b|-|a|+i] - B[i-1] means number of ones between the i-th element and |b|-|a|+i-th element of b the same for A[|b|-|a|-1] - A[i-1]). These are elements that the i-th member of a should be compared with to compute the sum of Hamming distances. Hence, the times complexity of this algorithm is \Theta(|a| + |b|).

OmG
  • 18,337
  • 10
  • 57
  • 90
1

You can do this in linear time and constant space.

Each bit in a will be compared with |b| - |a| + 1 bits in b, and each mismatch will add 1 to the sum of all Hamming distances.

Furthermore, for each bit of a, we don't need to know the whole bit sequence that it will be compared to from b. We only need to know how many zeros and how many ones it has. As we move forward one bit in a, the corresponding range shifts forward by one bit in b, and we can easily update these counts in constant time.

Here's an implementation in python:

def HammingSum(a,b):
    # compare initial range in b to first bit in a
    range0 = len(b)-len(a)+1
    numZeros = 0
    numOnes = 0
    for i in range(range0):
        if b[i]=='0':
            numZeros += 1
        else:
            numOnes += 1
    total = numOnes if a[0]=='0' else numZeros

    #adjust the range as we compare to the other bits

    for i in range(len(b)-range0):
        #count the bit we're adding to the end of the range
        if b[range0+i]=='0':
            numZeros += 1
        else:
            numOnes += 1

        #uncount the bit we remove from the start of the range
        if b[i]=='0':
            numZeros -= 1
        else:
            numOnes -= 1

        #compare range with bit in a
        total += numOnes if a[i+1]=='0' else numZeros

    return total
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87