3

I would like to generate all variants of a word within 1-edit distance using Levenshtein distance.

PHP has a function that will take two strings as parameter and will return just the number (int) of insert, replace and delete operations needed to transform str1 into str2. PHP Manual - levenshtein

int levenshtein ( string $str1 , string $str2 )

I am looking for a PHP solution to create an algorithm that generates the variants of a given word.

Don't Panic
  • 41,125
  • 10
  • 61
  • 80
Diego Vidal
  • 1,022
  • 12
  • 21
  • 1
    Possible duplicate of [How to find all strings at a given edit distance from a given string](http://stackoverflow.com/questions/12886997/how-to-find-all-strings-at-a-given-edit-distance-from-a-given-string) – Daniel A. Thompson Jul 07 '16 at 14:52
  • Hi Daniel, I have already read that article but it is written with python and they talk about how Google search work. I don't think my question is duplicate. I am looking for a PHP solution here. – Diego Vidal Jul 07 '16 at 14:55
  • How would You do this task using pen and paper? Let's assume that Your word is "AB" and Your alphabet is "A", "B", "C". I guess it is simple for You to solve this on the paper. Just write PHP code that does the same operations as You do. – Roman Hocke Jul 07 '16 at 15:07
  • I don't think it is that simple. I am looking for all variants of a word within 1-edit distance using Levenshtein distance (i.e. insertions, deletions or substitutions). For the word: "kingdom" I need to generate: kongdom, pingdom..., kngdom...,kingdomo... – Diego Vidal Jul 07 '16 at 15:13

1 Answers1

4

This is pretty easy for distance 1. Generating all possibilities for distances > 1 becomes somewhat more complex.

Start with a word:

$input = 'word';

Split the word into letters and generate a list of replacements.

$letters = str_split($input);

$alphabet = range('a', 'z');

Deletions are the easiest, just loop over each position and replace with '':

foreach ($letters as $i => $letter) {
    $variants[] = substr_replace($input, '', $i, 1);
}

Insertions and replacements can be done at the same time, because they'll both require a loop over the letters in the input nested inside a loop over the alphabet.

foreach ($alphabet as $variation) {
    foreach ($letters as $i => $letter) {

        // insertion
        $variants[] = substr($input, 0, $i) . $variation . substr($input, $i);

        // substitution
        // (check that the letter is different or you'll get multiple copies of the input)
        if ($variation != $letter) {
            $variants[] = substr_replace($input, $variation, $i, 1);
        }
    }
    $variants[] = $input . $variation; // handle insertion at the end
}

You can check the results to verify the levenshtein distances are correct:

foreach ($variants as $variant) {
    $result[$variant] = levenshtein($input, $variant);
}
Don't Panic
  • 41,125
  • 10
  • 61
  • 80