4

Problem: I know the trivial edit distance DP formulation and computation in O(mn) for 2 strings of size n and m respectively. But I recently came to know that if we only need to calculate the minimum value of edit distance f and it is bounded |f|<=s, then we can calculate it in O(min(m,n) + s^2) or O(s*min(m,n)) [wikipedia] time.

Please explain the dp formulation behind it if this is DP based or explain the algorithm .

Look at the improved algorithm section of the link: http://en.wikipedia.org/wiki/Edit_distance .

one more link about improved UKKONEN'S algorithm http://www.berghel.net/publications/asm/asm.php

Thanks in advance.

v78
  • 2,803
  • 21
  • 44

1 Answers1

13

You can calculate edit distance in O(min(n, m) * s) time use next simple idea:

Consider the i-th string in DP-table.

So, if we know, that answer <= s, then we are intersted in cells with coordinates (i, i - s), (i, i - s + 1), ... ,(i, i + s). Because in other cells answer strictly greater than s.

For example, suppose we know, that edit distance between "abacaba" and "baadba" less than 3.

DP-table for this strings

So, we can skip red cells, because they have value more than s.

Asymptotic of the algorithm O(min(n, m) * s) because we calculate s cells to the left and right of the main diagonal.

Nikita Sivukhin
  • 2,370
  • 3
  • 16
  • 33
  • 1
    but each entry(i,j) of the table depends on (i-1,j-1), (i-1,j), (i,j-1) entries . How do you find (5,2),(4,1) etc entries of the table in general case where a[i]!=b[j] (0 indexed) ? – v78 Oct 04 '14 at 10:39
  • 2
    If some cell depends on red cell, we can assume, that red cell have value s. Of course, with this algorithm we can't calculate all values correctly. Important fact that this algorithm correctly calculate all cells with value no more than s. Because of that, we can find edit distance (because we know, that it is no more than s). – Nikita Sivukhin Oct 04 '14 at 11:48
  • 1
    Can you provides some references for this idea.. – v78 Oct 04 '14 at 14:44
  • 2
    http://ntz-develop.blogspot.ru/2011/03/fuzzy-string-search.html On this page you can read about distance between strings. Also, there are a few words about the above algorithm. – Nikita Sivukhin Oct 04 '14 at 17:38
  • how do we determine the value of s? – otaku Oct 07 '14 at 16:02
  • 2
    We can calculate S based on the specifics of the problem. May be, we will ignore strings with some large edit distance (i think it may useful in some text analysing). But i don't know fast algorithm, which can calculate S. (i find some article in the internet: http://www.mit.edu/~andoni/papers/compEdit.pdf May be it will interesting) – Nikita Sivukhin Oct 08 '14 at 05:41
  • thanks ill have a look – otaku Oct 10 '14 at 06:12