2

I came across this variation of edit-distance problem:

Find the shortest path from one word to another, for example storm->power, validating each intermediate word by using a isValidWord() function. There is no other access to the dictionary of words and therefore a graph cannot be constructed.

I am trying to figure this out but it doesn't seem to be a distance related problem, per se. Use simple recursion maybe? But then how do you know that you're going the right direction?

Anyone else find this interesting? Looking forward to some help, thanks!

ToyYoda
  • 23
  • 1
  • 4
  • If you don't have access to the dictionary in use, you're left with a brute-force solution. – Lasse V. Karlsen Mar 25 '11 at 13:26
  • When you say "no other access to the dictionary of words", do you mean that you don't have access to the actual dictionary that `IsValidWord` is using, or that you cannot use a dictionary at all? Would using a separate dictionary be OK, in the hopes that it matches the one that `IsValidWord` is using? – Lasse V. Karlsen Mar 25 '11 at 13:28
  • no, bool IsValidWord() is the only way to find out if the word you've constructed is valid or not – ToyYoda Mar 25 '11 at 13:32
  • Does this aksi mean you can use any (non-lexical) a priori knowledge about the language? If you can, it's quite an interesting problem, otherwise you'll just have to bruteforce it. – biziclop Mar 25 '11 at 13:56
  • no, you can't. regarding your suggestion, how can you bruteforce it AND find the shortest path from one word to the other? – ToyYoda Mar 25 '11 at 14:00
  • Although by brute force, I really mean A* with the Hamming distance as a heuristic. :) Whiuch is essentially what Jeff Foster's answer says without saying it's an A*. – biziclop Mar 25 '11 at 14:01

2 Answers2

2

This is a puzzle from Lewis Carroll known as Word Ladders. Donald Knuth covers this in The Stanford Graphbase. This also

You can view it as a breadth first search. You will need access to a dictionary of words, otherwise the space you will have to search will be huge. If you just have access to a valid word you can generate all the permutations of words and then just use isValidWord() to filter it down (Norvig's "How to Write a Spelling Corrector" is a great explanation of generating the edits).

You can guide the search by trying to minimize the edit distance between where you currently are and where you can to be. For example, generate the space of all nodes to search, and sort by minimum edit distance. Follow the links first that are closest (e.g. minimize the edit distance) to the target. In the example, follow the nodes that are closest to "power".

I found this interesting as well, so there's a Haskell implementation here which works reasonably well. There's a link in the comments to a Clojure version which has some really nice visualizations.

John
  • 15,418
  • 12
  • 44
  • 65
Jeff Foster
  • 43,770
  • 11
  • 86
  • 103
  • thank you for your reply. as I said, there is no access to a dictionary apart from the isValidWord() function which can only tell you if it is a dictionary word or not. – ToyYoda Mar 25 '11 at 13:27
  • You can generate the edits and use isValidWord() to filter them down to just the valid words. I updated the description to explain a bit more. – Jeff Foster Mar 25 '11 at 13:30
  • No, one edit at a time is not going to be 11.8 million! "For a word of length n, there will be n deletions, n-1 transpositions, 26n alterations, and 26(n+1) insertions, for a total of 54n+25 (of which a few are typically duplicates). For example, len(edits1('something')) -- that is, the number of elements in the result of edits1('something') -- is 494." (from the Norvig link). Using isValidWord() for this few words should be fine. – Jeff Foster Mar 25 '11 at 13:37
  • and I think it should be assumed that `isValidWord()` should be part of the agorithm, rather than be used to reconstruct the 5-letter words of the dictionary (although, it is a pretty neat idea!) – ToyYoda Mar 25 '11 at 13:37
  • @Jeff But after 1 level of substitutions, how does he know which one would move him in the right direction? For instance, he may not be able to transition from one word to another by just replacing distinct characters each iteration, he may have to first substitute the first character, then the second, then the first again, to get a working path. Without a dictionary *at all*, this quickly balloons out of control. – Lasse V. Karlsen Mar 25 '11 at 13:40
  • @Lasse - good point. A simple heuristic would be to choose nodes that are closest (in terms of edit distance) to the target solution. I used this with my Haskell implementation and could get paths computed in a few milliseconds between arbitrary words. I've updated the answer to include this (it's a fairly important bit I forgot, so thanks for pointing it out!). – Jeff Foster Mar 25 '11 at 13:45
0

You can search from two sides at the same time. I.e. change a letter in storm and run it through isValidWord(), and change a letter in power and run it through isValidWord(). If those two words are the same, you have found a path.

Sjoerd
  • 74,049
  • 16
  • 131
  • 175
  • thanks for your reply. searching from two sides using both words didn't cross my mind. thanks for the idea! a single letter change though in both words could produce two identical words but in this case with storm and power this can never happen :/ – ToyYoda Mar 25 '11 at 13:29
  • @ToyYoda The idea is that if you start from both ends, you have to visit fewer nodes. If a word has `k` average valid neighbours and the distance between the two words is `l`, you'll only have to visit `2*k^(l/2)` words, not `k^l` – biziclop Mar 25 '11 at 13:59