4

I'm sure you've all heard of the "Word game", where you try to change one word to another by changing one letter at a time, and only going through valid English words. I'm trying to implement an A* Algorithm to solve it (just to flesh out my understanding of A*) and one of the things that is needed is a minimum-distance heuristic.

That is, the minimum number of one of these three mutations that can turn an arbitrary string a into another string b: 1) Change one letter for another 2) Add one letter at a spot before or after any letter 3) Remove any letter

Examples

aabca => abaca:
aabca
abca
abaca
= 2

abcdebf => bgabf:
abcdebf
bcdebf
bcdbf
bgdbf
bgabf
= 4

I've tried many algorithms out; I can't seem to find one that gives the actual answer every time. In fact, sometimes I'm not sure if even my human reasoning is finding the best answer.

Does anyone know any algorithm for such purpose? Or maybe can help me find one?

(Just to clarify, I'm asking for an algorithm that can turn any arbitrary string to any other, disregarding their English validity-ness.)

Justin L.
  • 13,510
  • 5
  • 48
  • 83
  • If you don't actually care about the inbetween steps being actual english words, which it seems you don't judging by the comment you left below, you should mention that in your question, since you're description of the original word game seems to indicate you do care. – Lasse V. Karlsen May 15 '10 at 21:14
  • Sorry; I thought I was changing the terms of my question when I said "any arbitrary string", and then gave examples with strings that weren't words. But I think the context is a bit misleading, so I'll make it more clear. Thanks =) – Justin L. May 17 '10 at 04:35
  • The first paragraph of the question is completely misleading! Also this _has_ to be a dupe. –  May 17 '10 at 04:44

3 Answers3

6

You want the minimum edit distance (or Levenshtein distance):

The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965.

And one algorithm to determine the editing sequence is on the same page here.

MSN
  • 53,214
  • 7
  • 75
  • 105
  • 2
    that may not apply since he is using english-only words. – Bill K May 13 '10 at 23:43
  • actually, this is exactly what I'm looking for; I'm looking for a shortest-distance heuristic that doesn't bother with the dictionary. Thanks =) – Justin L. May 14 '10 at 07:10
  • Bear in mind that if you're trying to find the shortest path via valid words, the levenstein distance only provides a lower bound. The option that has the lowest levenstein distance could actually be further from the destination than one with a higher distance. – Nick Johnson May 16 '10 at 00:58
  • I'm trying to implement an A* pathfinding algorithm to find the shortest path; the implementation requires a lower-bound heuristic to assist in calculations. – Justin L. May 20 '10 at 09:18
2

An excellent reference on "Edit distance" is section 6.3 of the Algorithms textbook by S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani, a draft of which is available freely here.

Dijkstra
  • 2,490
  • 3
  • 21
  • 35
1

If you have a reasonably sized (small) dictionary, a breadth first tree search might work.

So start with all words your word can mutate into, then all those can mutate into (except the original), then go down to the third level... Until you find the word you are looking for.

You could eliminate divergent words (ones further away from the target), but doing so might cause you to fail in a case where you must go through some divergent state to reach the shortest path.

Bill K
  • 62,186
  • 18
  • 105
  • 157
  • Well, I have my search algorithm implemented (A*), which accounts for divergent words pretty well (the same way that it can find the best path around a mountain by moving away from the mountain first and going around, instead of always picking the closest point); it has a neat priority system, but all of it relies on a reliable Minimum Distance heuristic; in pathfinding, that's a straight line, ignoring all obstacles. This would be the linguistic equivalent. – Justin L. May 14 '10 at 07:13
  • So then I don't know of any way except for trying every path and finding the shortest. Given the two words and taking your first step (including your algorithm) how many words would you expect to have to check branching off the first word? If it's just 10 or so you could probably just do a breadth-first search of the entire tree. If it's much more you might have to do a depth-first until you hit a depth of 3 or so then do a breadth-first of that node just to stay within memory constraints. With chess programs I think they do this but are good at throwing away bad paths. – Bill K May 14 '10 at 16:29