Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertion, deletion, substitution) required to change one word into the other.
here is a simple analysis
$input = 'htc corporation';
// array of words to check against
$words = array(
'htc',
'Sprint Nextel',
'Sprint',
'banana',
'orange',
'radish',
'carrot',
'pea',
'bean'
);
foreach ( $words as $word ) {
// Check for Intercept
$ic = array_intersect(str_split($input), str_split($word));
printf("%s \t l= %s , s = %s , c = %d \n",$word ,
levenshtein($input, $word),
similar_text($input, $word),
count($ic));
}
Output
htc l= 12 , s = 3 , c = 5
Sprint Nextel l= 14 , s = 3 , c = 8
Sprint l= 12 , s = 1 , c = 7
banana l= 14 , s = 2 , c = 2
orange l= 12 , s = 4 , c = 7
radish l= 12 , s = 3 , c = 5
carrot l= 11 , s = 1 , c = 10
pea l= 13 , s = 2 , c = 2
bean l= 13 , s = 2 , c = 2
It clear htc has a distance of 12
while carrot has 11
if you want htc then Levenshtein
alone is not enough .. you need to compare exact word then set priorities