Redundancy in Levenshtein distance algorithm

Question

In the typical dynamic Levenshtein distance algorithm, to compute the value of cell d[i][j], where i and j are the row and column numbers, respectively, we take the minimum of d[i-1][j-1]+0/1, d[i-1][j]+1 and d[i][j-1]+1. However, it seems to me that the minimum of d[i-1][j-1]+0/1 and d[i-1][j]+1 is always going to be d[i-1][j-1]+0/1, in which case including d[i-1][j]+1 in the computation seems redundant. Is it ever the case that d[i-1][j-1]+0/1 > d[i-1][j]+1 in the Levenshtein distance algorithm, and if not, wouldn't it be more efficient to omit this comparison?

EDIT: Sorry for the under-researched question; any standard run of the algorithm shows instances where d[i-1][j-1]+0/1 > d[i-1][j]+1:

    A
 +-+-+
 |0|1|
 +-+-+
A|1|0|
 +-+-+

(consider the second row).

Codor · Answer 1 · 2014-07-14T20:15:33.873

1

Referring to the Wikipedia Article, the minimum in the last case must be taken in the "deletion" case.

Suppose we want to calculate the Levenshtein distance between abc and ab (from now on fixed and omitted from notation).

Iterative evaluation yields the following intermediate results.

lev(0,0) = 0 (1st case applies)
lev(0,1) = 1 (1st case applies)
lev(0,2) = 2 (1st case applies)

lev(1,0) = 1 (1st case applies)
lev(1,1) = min(2,2,0) (2nd case, minimum taken in last term) = 0
lev(1,2) = min(1,2,1) (2nd case, minumum taken in last term) = 1

lev(2,0) = 2 (1st case applies)
lev(2,1) = min(3,1,2) (2nd case, minimum taken in second term) = 1 (*)
lev(2,2) = min(2,2,0) (2nd case, minimum taken in the last term) = 0

lev(3,0) = 3 (1st case applies)
lev(3,1) = min(4,2,2) (2nd case, minimum taken in the second and third term) = 2
lev(3,2) = min(3,1,2) (2nd case, minimum taken in the second term) = 1 (*)

The row marked with (*) are cases where the second case occurs, but the minimum is not taken in the last term. An online calculator also showing the dynamic programming table can be found here.

edited Jul 14 '14 at 20:15

answered Jul 14 '14 at 19:09

Codor

17,447
9
29
56

Yes it does, if `a[i] == b[j]`. – TonyK Jul 14 '14 at 19:18
"Note that this implementation does not fit the definition precisely" - though this version is deemed equivalent, I'm looking for a solution in terms of the traditional variant. – Sean Kelleher Jul 14 '14 at 19:24
@SeanKelleher Okay thanks for your remark; do you refer to a specific presentation of the algorithm? If yes, please include it in your question, it will make communication a bit easier. In fact, I realize I somewhat misread the notation for the last case there. – Codor Jul 14 '14 at 19:28
@SeanKelleher Regarding the reasoning behind the algorithm, the minimum must be attained in the "mismatch" case. – Codor Jul 14 '14 at 19:35
@Codor I've edited the question as I've realised the answer is trivial. Thanks for your time. – Sean Kelleher Jul 14 '14 at 19:36
@SeanKelleher Well you're welcome, it was a quite insightful hour. – Codor Jul 14 '14 at 20:17

Redundancy in Levenshtein distance algorithm

1 Answers1