0

I have a basic search script which I'm working on. I want users to be able to enter several keywords. If one of these keywords are mis-spelt, I want to change that word for the search results and/or display a "did you mean ..." message.

I have tried levenshtein but it only seems to work for a single word and doesn't seem very reliable anyway. When using this function, in testing, I came up with this:

<?php
$input = 'ornage ptoato';

$possible_words = explode(' ', trim(strtolower($input)));

foreach($possible_words as $value){

   $words  = array('sony','red', 'indigo','orange','bell','toshiba','potato');

   $shortest = -1;

   foreach ($words as $word) {

       $lev = levenshtein($value, $word);

       if ($lev == 0) {

           $closest = $word;
           $shortest = 0;

           break;
       }

       if ($lev <= $shortest || $shortest < 0) {
           // set the closest match, and shortest distance
           $closest  = $word;
           $shortest = $lev;
       }
   }

}
echo "Input word: $input<br>";
if ($shortest == 0) {
    echo "Exact match found: $closest";
} else {
    echo "Did you mean: $closest?\n";
}

?>

There is foreach within a foreach because I was trying to do it for each word within the search string.

I basically want it to work like Google's "did you mean.." and eBay's "0 results found for one two theer, so we searched for one two three".

tshepang
  • 12,111
  • 21
  • 91
  • 136

1 Answers1

1

Your code needed a little tweaking.

<?php
$input = 'ornage ptoato toshiba butts';
$possible_words = explode(' ', trim(strtolower($input)));
$words = array('sony','red', 'indigo','orange','bell','toshiba','potato');
$threshold = 4;

foreach($possible_words as $value){
    $shortest = -1;
    if( in_array($value, $words) ) {
        printf("Exact match for word: %s\n", $value);
    } else {
        foreach ($words as $word) {
             $lev = levenshtein($value, $word);

             if ($lev <= $shortest || $shortest < 0) {
                  // set the closest match, and shortest distance
                  $closest  = $word;
                  $shortest = $lev;
             }
        }
        if($shortest < $threshold) {
            printf("You typed: %s.\nAssuming you meant: %s\n", $value, $closest);
        } else {
            printf("Could not find acceptable match for: %s\n", $value);
        }
    }
}
  1. The check for acceptable matches needed to go inside the outer loop.
  2. You can use in_array() to search for an exact match before calculating the Levenshtein Distance
  3. You probably only want to match words within reason. [hench $threshold]

Output:

You typed: ornage.
Assuming you meant: orange
You typed: ptoato.
Assuming you meant: potato
Exact match for word: toshiba
Could not find acceptable match for: butts
Sammitch
  • 30,782
  • 7
  • 50
  • 77