In my application, I have a users table, with first_name
and last_name
. I currently have a third column full_name
(automatically generated) like this : first_name + last_name + first_name (without special chars).
"Etienne", "De Crécy", "Etienne De Crecy Etienne"
At now, I have a simple algorithm to autocomplete user inputs (special chars removed):
SELECT * FROM users WHERE full_name LIKE "%input%"
This query returns Etienne with inputs Crécy Etienne
, Etienne De
, Cré
, Cre
, Etienne
I want to add some fuzzy in this query to allow users mispellings. This new algorithm should be able to return Etienne when users write:
Etiene
(similar to first name)Etienne Crecy
(similar to full name, without particule)Crecy Etienne
(similar to full name, without particule, other direction)De Cressi
(sounds like last name)Cressi
(sounds like last name, without particule)
I do a lot of searchs, the most relevant idea is to use SOUNDEX
method (or Metaphone
procedures), or levenstein
procedures. I can not use it as it, because:
- Soundex is based on the first letter, then
SOUNDEX(Cressy)
is not the same asSOUNDEX(De cressy)
, even if they are very similar. - Metaphone is base on the position of the letters (beggining by 'kn' is like begging by 'n', but only in first position)
- levenstein doesn't take care about string length : De Cressy is not similar to Cressy.
Have you any ideas about to "mix" theses methods, or do you have any other idea for me ?