5

We are evaluating a bunch of options at the moment for Fuzzy searches. One of our requirements is that mistyped words be matched. An OCR of CRA1G (with the number 1) must match 'CRAIG'.

Full text search in sql server seems to get close, but no cigar.

Are there some options besides 'FORMSOF(Inflectional, CRA1G)' that might be able to make substitutions for these types of mismatches and still get results?

I am unable to find anything similar SOUNDEX is definitely not good enough.

I might try Lucene next, which has good fuzzy search capability.

Regards

Craig.

Jim
  • 14,952
  • 15
  • 80
  • 167

3 Answers3

3

SQL Server 2008 supports thesauruses should get you to where you need. You would define your misspellings as synonyms. See http://msdn.microsoft.com/en-us/library/ms142491.aspx.

Edit: To my understanding, you couldn't do this using a wildcard or regular expression approach. You'd have to define common replacement set patterns for your words that you anticipate will be most commonly mis-typed.

Shan Plourde
  • 8,528
  • 2
  • 29
  • 42
  • It will probably be too difficult to generate all the possible replacement patterns. We will give it a try, if it doesn't work I am going to use lucene. – Jim Apr 26 '11 at 04:58
2

This might not help you, but we've had this problem for names. In german, there are alot of similar sounding names: Bayer, Beyer, Baier would all sound the same. For this, we've created a seperate column using phonetic spelling. You might create a similar alphabet for your fuzziness, where "I" equals "1", and get similar results.

0xCAFEBABE
  • 5,576
  • 5
  • 34
  • 59
0

I know it's an old thread, but many of us face the same problem till today. One good solution is searloc. It is a CLR library with zero dependencies and with many features. It supports full text search, phonetic match for all languages, keyboard match, fuzzy search, and many others. And the most important it is very fast, it needs just a few milliseconds to search million of records.

  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/32042767) – paneerakbari Jun 21 '22 at 20:07