1

I am trying to do a search query with SQL; my page contains an input field who's value is taken and simply concatenated to my SQL statement.

So, Select * FROM users after a search then becomes SELECT * FROM users WHERE company LIKE '%georges brown%'.

It then returns results based on what the user types in; in this case Georges Brown. However, it only finds entries who's companies are exactly typed out as Georges Brown (with an 's').

What I am trying to do is return a result set that not only contains entries with Georges but also George (no 's').

Is there any way to make this search more flexible so that it finds results with Georges and George?

Kermit
  • 33,827
  • 13
  • 85
  • 121
mousesports
  • 509
  • 1
  • 6
  • 18
  • is the georges brown hardcoded by you, or is that user-entered data? if it's user-provided, you'd have to build up a list of translations as google does. "did you mean xxx?" – Marc B Oct 23 '12 at 14:33

6 Answers6

1

Try using more wildcards around george.

SELECT * FROM users WHERE company LIKE '%george% %brown%'
0

Try this query:

SELECT * 
FROM users 
WHERE company LIKE '%george% brown%'
Robert
  • 25,425
  • 8
  • 67
  • 81
0

Use SOUNDEX

http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_soundex

You can also remove last 2 characters and get SOUNDEX codes and compare them.

Raj More
  • 47,048
  • 33
  • 131
  • 198
0

In specific cases you can use a wildcard:

WHERE company LIKE '%george% brown%' -- will match `georges` but not `georgeani`

_ is a single-character wildcard, while % is a multi-character wildcard.

But maybe it's better to use another piece of software for indexing, like Sphinx.

It has: "Flexible text processing. Sphinx indexing features include full support for SBCS and UTF-8 encodings (meaning that effectively all world's languages are supported); stopword removal and optional hit position removal (hitless indexing); morphology and synonym processing through word forms dictionaries and stemmers; exceptions and blended characters; and many more."

It allows you do do smarter searches with partial matches, while providing a more accuracy than soundex, for example.

GolezTrol
  • 114,394
  • 18
  • 182
  • 210
  • Unfortunately, when I tried adding `_` as a single-character wildcard, it doesn't return any results for 'George Brown'; it only returns the results for 'Georges Brown'. – mousesports Oct 23 '12 at 14:58
  • Of course. Silly of me. `_` matches *exactly* one character. So while it may have its uses, it's not useful here. Modified the query. – GolezTrol Oct 23 '12 at 15:00
0

You'll have to look at the documentation of your database system. MySQL for example provides the SOUNDEX function.

Otherwise, what should always work and give you better matching is to only work on upper or lower cased strings. SQL-92 defines the TRIM, UPPER, and LOWER functions. So you'd do something like WHERE UPPER(company) LIKE UPPER('%georges brown%').

muehlbau
  • 1,897
  • 13
  • 23
0

Probably best to explode out your search string into individual words then find the plural / singular of each of those words. Then do a like for both possibilities for each word.

However for this to be usably efficient on large amounts of data you probably want to run against a table of words linked to each company.

Soundex alone probably isn't much use as too many words are similar (it gives you a 4 character code, the first character being the first character of the word, while the next 3 are a numeric code). Levenshtein is more accurate but MySQL has no method for this built in although php does have a fast function for this (the MySQL functions I found to calculate it were far too slow to be useful on a large search).

What I did for a similar search function was to take the input string and explode it out to words, then converting those words to their singular form (my table of used words just contain singular versions of words). For each word I then found all the used words starting with the same letter and then used levenshtein to get the best match(es). And from this listed out the possible matches. Made it possible to cope with typoes (so it would likely find George if someone entered Goerge), and also to find best matches (ie, if someone searched on 5 words but only 4 were found). Also could come up with a few alternatives if the spelling was miles out.

You may also want to look up Metaphone and Double Metaphone.

Kickstart
  • 21,403
  • 2
  • 21
  • 33
  • Exploding the string into individual words and comparing them is what I ended up doing in this case - I'm not dealing with large amounts of data. Thanks for your comment. – mousesports Oct 23 '12 at 15:30