0

I am about to tackle a programming problem about the Levenshtein Distance. And according to the definition given on my sheet, it states that the Lenveshtein distances is equal to the number of substitutions, insertions and deletions between two strings. However wouldn't a subsitution just be a deletion and then an insertion? What am I missing here?

entropy
  • 169
  • 3
  • 13

1 Answers1

1

You can achieve the effect of a substitution using an insertion plus a deletion, yes. But if you limit yourself to insertions and deletions only, each such "substitution" you create this way will cost you 2 instead of 1. That may be realistic for some applications, but sometimes it's more plausible to suppose that a substitution costs the same/is as likely as an insertion or a deletion, rather than twice as costly/half as likely.

[EDIT]

In fact it's possible and sometimes useful to invent much more general edit distances than the standard Levenshtein distance. You can give any arbitrary costs to insertions, deletions and substitutions. You can even extend the set of operations to also include transpositions (changing ab to ba for some fixed cost) or block operations ("insert a copy of the length-j substring starting at position i" for some fixed cost). The effect of a transposition is of course achievable without a special "transposition" move using a deletion plus an insertion, but again this results in the move costing more than either a deletion or an insertion alone. If your application is that you want to find the English word that a person most likely "meant" when they type a word that's not in the dictionary, you would probably prefer to use a distance where a transposition has low cost, because this is a common typing error.

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
  • Also, when a string is implemented via an array, insertions and deletions may cost O(n) each, vs O(1) for substitution – James Waldby - jwpat7 Jul 26 '13 at 14:18
  • @jwpat7: That's true in one sense, but I don't see how it's relevant... Actually calculating an edit distance doesn't require any such operations (you just fill in a DP matrix). Also if you have the source string and an edit script and you want to produce the target string in a new buffer, you can perform each edit operation in O(1) time. – j_random_hacker Jul 26 '13 at 14:31