Given two strings of equal length, Levenshtein distance allows to find the minimum number of transformations necessary to get the second string, given the first. However, I'd like to find a way to adjust the alogrithm for multiple pairs of strings, given that they were all generated in the same way.
-
You want the Lev. distance between all the strings and their parent? Or all mutual Lev. distances between any two arbitrary two strings from the full group? Or Lev. distances between A->B->C->D->E etc..? – Marc B Jan 26 '11 at 20:16
-
Let there be one algorithm to convert A->B == C->D == E->F. I'm trying to find that one algorithm. – user490735 Jan 26 '11 at 20:19
-
Can you be more specific about "generated in the same way?" And can you elaborate on your above comment? This is an interesting question, but I don't understand what you're asking. – templatetypedef Jan 26 '11 at 20:24
-
Do the algorithm multiple times? This question is not well defined, and the above comment makes no sense. According to the comment, you want to find the distances and see if they equal each other? Then just do the algorithm multiple times. If you want a binary distance between three words, that would be a different algorithm. – Lee Louviere Jan 26 '11 at 20:24
-
How is that _not_ equivalent to computing Levenshtein distance between A and B, then C and D, then E and F, etc? I think you need to elaborate more on what you're trying to do... – triazotan Jan 26 '11 at 20:26
-
It is given that each pair was generated in the same way. But when L. distance is computed, this algorithm isn't necessarily found immediately. The distances aren't equal for each pair and I want to adjust for that, until they are. That won't be minimum distance anymore, but the goal is to find the minimum common algorithm, which is computed along with the distances. The distance is just a number, I need to find the algoirthm. – user490735 Jan 26 '11 at 20:29
1 Answers
Reading the comments, it appears that this is the problem:
You are given a set of pairs of strings, all the same length and each pair is the input to some function paired with the output from the function. So, for the pair A,B, we know that f(A)=B. The goal is to reverse engineer f() with a large set of A,B pairs.
Using Levenshtein distance on the entire set will, at most, tell you the maximum number of transformations that must take place.
A better start would be Hamming distance (modified to allow multiple characters) or Jaccard similarity to identify how many positions in strings do not change at all for all of the pairs. Then, you are left only with those that do change.
This will fail if the letters shift.
To detect shift, you want to use global alignment (Needleman-Wunsch). You will then see something like "ABCDE"=>"xABCD"
to show that from the input to the output, there was a left shift.
Overall, I feel that Levenshtein distance will do very little to help you get at the original algorithm.