0

I am studying the Stanford NLP course by the following slides:https://web.stanford.edu/class/cs124/lec/med.pdf. The definition of edit Distance algorithm in this slide as following:

Initialization

D(i,0) = i
D(0,j) = j

Recurrence Relation:

 For each  i = 1…M
    For each  j = 1…N


       D(i,j)= min  {D(i-1,j) + 1, D(i,j-1) + 1, 
                     D(i-1,j-1) +   2(if X(i) ≠ Y(j) )  
                                    0(if X(i) = Y(j))}

why D(i-1,j-1) + 2 not (+1), if X(i) ≠ Y(j). I found the definition of edit Distance algorithm in the wikipedia is '+1':https://en.wikipedia.org/wiki/Levenshtein_distance. Could you guys explain the difference of these two definition, please. I am a new guy to NLP. Thanks!

tktktk0711
  • 1,656
  • 7
  • 32
  • 59

1 Answers1

4

When editing one string, what is the minimal number of changes you need to do in order to get another string?

This is a general, not specific, definition for editing distance. To get a precise definition, you need to define what a "change" is.

  • If a "change" includes "replacing one letter by another", you have +1 in your definition
  • If a "change" doesn't include "replacing one letter by another", you have +2 in your definition

Example: how many changes do you need to turn feh into fah?

  • One change - just replace e by a
  • Two changes - remove e; then add a in the same place

Both answers are useful, and lead to two slightly different definitions of editing distance.

anatolyg
  • 26,506
  • 9
  • 60
  • 134
  • thanks for your @anatolyg comments. you mean if plus 2 include two operations: deletion and substitution. But I am wondering, if plus 2, is it difficult to get the least operations or minimum edit distance ? Could you give example to explain it in detail. – tktktk0711 Oct 21 '16 at 00:29