You might consider using the Doctrine Inflector class in conjunction with a stemmer for this.
Here's the algorithm at a high level
- Split search string on spaces, process words individually
- Lowercase the search word
- Strip special characters
- Singularize, replace differing portion with wildcard ('%')
- Stem, replace differing portion with wildcard ('%')
Here's the function I put together
/**
* Use inflection and stemming to produce a good search string to match subtle
* differences in a MySQL table.
*
* @string $sInputString The string you want to base the search on
* @string $sSearchTable The table you want to search in
* @string $sSearchField The field you want to search
*/
function getMySqlSearchQuery($sInputString, $sSearchTable, $sSearchField)
{
$aInput = explode(' ', strtolower($sInputString));
$aSearch = [];
foreach($aInput as $sInput) {
$sInput = str_replace("'", '', $sInput);
//--------------------
// Inflect
//--------------------
$sInflected = Inflector::singularize($sInput);
// Otherwise replace the part of the inflected string where it differs from the input string
// with a % (wildcard) for the MySQL query
$iPosition = strspn($sInput ^ $sInflected, "\0");
if($iPosition !== null && $iPosition < strlen($sInput)) {
$sInput = substr($sInflected, 0, $iPosition) . '%';
} else {
$sInput = $sInput;
}
//--------------------
// Stem
//--------------------
$sStemmed = stem_english($sInput);
// Otherwise replace the part of the inflected string where it differs from the input string
// with a % (wildcard) for the MySQL query
$iPosition = strspn($sInput ^ $sStemmed, "\0");
if($iPosition !== null && $iPosition < strlen($sInput)) {
$aSearch[] = substr($sStemmed, 0, $iPosition) . '%';
} else {
$aSearch[] = $sInput;
}
}
$sSearch = implode(' ', $aSearch);
return "SELECT * FROM $sSearchTable WHERE LOWER($sSearchField) LIKE '$sSearch';";
}
Which I ran with several test strings
Input String: Mary's Hamburgers
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'mary% hamburger%';
Input String: Office Supplies
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'offic% suppl%';
Input String: Accounting department
SearchString: SELECT * FROM LIST2 WHERE LOWER(some_field) LIKE 'account% depart%';
Probably not perfect, but it's a good start anyway! Where it will fall down is when multiple matches are returned. There's no logic to determine the best match. That's where things like MySQL fulltext and Lucene come in. Thinking about it a little more, you might be able to use levenshtein to rank multiple results with this approach!