I have a string S of length 1000 and a query string Q of length 100. I want to calculate the edit distance of query string Q with every sub-string of string S of length 100. One naive way to do is calculate dynamically edit distance of every sub-string independently i.e. edDist(q,s[0:100])
, edDist(q,s[1:101])
, edDist(q,s[2:102])
....... edDist(q,s[900:1000])
.
def edDist(x, y):
""" Calculate edit distance between sequences x and y using
matrix dynamic programming. Return distance. """
D = zeros((len(x)+1, len(y)+1), dtype=int)
D[0, 1:] = range(1, len(y)+1)
D[1:, 0] = range(1, len(x)+1)
for i in range(1, len(x)+1):
for j in range(1, len(y)+1):
delt = 1 if x[i-1] != y[j-1] else 0
D[i, j] = min(D[i-1, j-1]+delt, D[i-1, j]+1, D[i, j-1]+1)
return D[len(x), len(y)]
Can somebody suggest an alternate approach to calculate edit distance efficiently. My take on this is that we know the edDist(q,s[900:1000])
. Can we somehow use this knowledge to calculate edDist[(q,s[899:999])]
...since there we have a difference of 1 character only and then proceed backward to edDist[(q,s[1:100])]
using the previously calculated edit Distance ?