Why the definition of edit Distance algorithm in Stanford NLP course plus 2 not 1

Question

I am studying the Stanford NLP course by the following slides:https://web.stanford.edu/class/cs124/lec/med.pdf. The definition of edit Distance algorithm in this slide as following:

Initialization

D(i,0) = i
D(0,j) = j

Recurrence Relation:

 For each  i = 1…M
    For each  j = 1…N


       D(i,j)= min  {D(i-1,j) + 1, D(i,j-1) + 1, 
                     D(i-1,j-1) +   2(if X(i) ≠ Y(j) )  
                                    0(if X(i) = Y(j))}

why D(i-1,j-1) + 2 not (+1), if X(i) ≠ Y(j). I found the definition of edit Distance algorithm in the wikipedia is '+1':https://en.wikipedia.org/wiki/Levenshtein_distance. Could you guys explain the difference of these two definition, please. I am a new guy to NLP. Thanks!

score 4 · Accepted Answer · answered Oct 20 '16 at 09:45

When editing one string, what is the minimal number of changes you need to do in order to get another string?

This is a general, not specific, definition for editing distance. To get a precise definition, you need to define what a "change" is.

If a "change" includes "replacing one letter by another", you have +1 in your definition
If a "change" doesn't include "replacing one letter by another", you have +2 in your definition

Example: how many changes do you need to turn feh into fah?

One change - just replace e by a
Two changes - remove e; then add a in the same place

Both answers are useful, and lead to two slightly different definitions of editing distance.

thanks for your @anatolyg comments. you mean if plus 2 include two operations: deletion and substitution. But I am wondering, if plus 2, is it difficult to get the least operations or minimum edit distance ? Could you give example to explain it in detail. — tktktk0711, Oct 21 '16 at 00:29

Why the definition of edit Distance algorithm in Stanford NLP course plus 2 not 1

1 Answers1