8

Does Apaches Solr search engine provide approximate string matches, e.g. via Levenshtein algorithm?

I'm looking for a way to find customers by last name. But I cannot guarantee the correctness of the names. How can I configure Solr so that it would find the person "Levenshtein" even if I search for "Levenstein" ?

kellyfj
  • 6,586
  • 12
  • 45
  • 66
prinzdezibel
  • 11,029
  • 17
  • 55
  • 62

2 Answers2

16

Typically this is done with the SpellCheckComponent, which internally uses the Lucene SpellChecker by default, which implements Levenshtein.

The wiki really explains very well how it works, how to configure it and what options are available, no point repeating it here.

Or you could just use Lucene's fuzzy search operator.

Another option is using a phonetic filter instead of Levenshtein.

Mauricio Scheffer
  • 98,863
  • 23
  • 192
  • 275
  • Mauricio, could you check the two links to the fuzzy search operator and the phonetic filter? both appear to be broken. Thanks! – reto Jul 10 '12 at 08:13
4

Great answer by Mauricio, my only "cheapo" addition is to just append the ~ character to all terms that you want to fuzzy match on the way in to solr. If you are using the default set up, this will give you fuzzy match.

MattMcKnight
  • 8,185
  • 28
  • 35
  • 1
    @MattMcKnight: I want to do the same distance measure in solr but **~** is not working in mine.. I tried using **?q=term:"apple"~2** Any help – iNikkz Dec 16 '14 at 12:20
  • 1
    @iNikkz If you put quotes around apple, I think it becomes a phrase query, so the ~2 means proximity search, instead of edit distance. Try dropping the quotes – MattMcKnight Dec 16 '14 at 22:48
  • @MattMcKnight: I tried with dropping the quotes but it gives too many results becuase I have used phonetic filtering on both index and query. I have pasted my question here - [http://stackoverflow.com/questions/27484326/getting-most-likely-documents-of-the-query-using-phonetic-filter-in-solr]. Will you help me please? – iNikkz Dec 17 '14 at 05:20