1

Suppose we have two strings:

  1. ccttgg
  2. gacgct

The edit distance of these two strings is 6.

Possible substrings are:

  1. cctt--
  2. gacg--

Their edit distance is 4.

The remaining parts to equal the original two strings are:

  1. ----gg
  2. ----ct

and their edit distance is 2.

So 4+2=6, that is the original edit distance.

Is this type of assumption always correct?

If it's not, is there a way to compute the edit distance between two strings using the edit distance of their substrings?


Edit: to be clearer my definition of edit distance is the Levenshtein distance with a cost of 1 for insertion, deletion and replace if the characters are not the same and 0 if the characters are equal. I'm not considering the Damerau distance with transpositions.

John
  • 23
  • 4

1 Answers1

1

No

Counterexample

Consider the strings:

  1. aba
  2. bab

They have an edit distance of 2 by deleting an "a" from the front and adding a "b" to the end.

If these are broken into substrings such as

  1. ab, a
  2. ba, b

then the first substrings have an edit distance of 2 and the second substrings have an edit distance of 1 for a total of 3.

dta
  • 654
  • 4
  • 19
  • I'm not considering swap/transposition like the Damerau distance – John Apr 21 '21 at 16:58
  • Updated to provide a counterexample for Levenshtein distance. – dta Apr 21 '21 at 17:10
  • Thank you, I already had some serious doubts. Do you know a way to compute an edit distance by splitting the two initial strings into substrings? – John Apr 21 '21 at 17:20
  • If you don't allow only substitutions, then that property would hold. Otherwise, I don't think you can split substrings and consider them independent. The [algorithm](https://en.wikipedia.org/wiki/Levenshtein_distance#Iterative_with_full_matrix) does involve finding the distance between prefixes of the strings, so that may be relevant. – dta Apr 21 '21 at 18:59