0

I want to keep two versions of the words in my Solr index. First one is with AsciiFolding applied, second is without AsciiFolding applied.

So for example if the user typed:

gru

I want to suggest back

grün

AsciiFolding is converting the umlauts to (a, u, o) and I want to keep using the original words. So if the user typed:

grü

I'm unable to suggest the real and original word which it is totally not correct.

Chiron
  • 20,081
  • 17
  • 81
  • 133

1 Answers1

2

Simplest way would be to use 2 fields, say, text_original and text_ascii, apply AsciiFolding to one of them only and then use copyField directive to copy both results to a common field text.

Note, however, that this will duplicate all other words too. Alternatively, you can rewrite AsciiFolding to insert both versions of the word into a token stream (this is how synonym search work, IIRC "Lucene in Action" had a nice explanation of the process).

Community
  • 1
  • 1
ffriend
  • 27,562
  • 13
  • 91
  • 132