0

I have a string comparison function based on levenshtein but it don't work properly.

function levenshteinTest($input, $array)
{
 $shortest = -1;
 foreach ($array as $word)
 {    
  $lev = levenshtein($input, $word);
   if ($lev == 0)
   {
    $closest = $word;
    $shortest = 0;
    break;
   }
   if ($lev <= $shortest || $shortest < 0)
   {
    $closest  = $word;
    $shortest = $lev;
   }
  }
 return $closest;
}
$test=array(
       "Richard Bürstmayr",
       "Sandra Ebner"
      );
var_dump(levenshteinTest("brstmyr", $test); //Sandra Ebner
var_dump(levenshteinTest("rd brstmyr", $test); //Richard Bürstmayr

As you see I get a bad result at the first dump but a good one at the second one. I think the problem has something to do with word length but I can't really figure out how I could fix that. My array values contain all at least two words.

Lithilion
  • 1,097
  • 2
  • 11
  • 26

1 Answers1

0

You arent getting a wierd result.

Through doing the test through http://writecodeonline.com/php .

I got a Levenshtein distance smaller than that of the second result in the array.

Remember Levenshtein distance is the count of edits you would have to make to the string for that corresponding string to become the one your comparing with.

Matt Burrow
  • 10,477
  • 2
  • 34
  • 38
  • I know that I'm getting a smaller distance, but you can see that it "should" be another result. And I wanna know how to do that – Lithilion Jul 17 '14 at 16:15
  • 1
    I do know what you mean, but the character count on each of the strings are different meaning this will have an affect on your Levenshtein distance. You could use the php function similar_text which can calculate a percentage of similarity found here [similar_text](http://php.net/manual/en/function.similar-text.php) – Matt Burrow Jul 17 '14 at 16:19