I have a Solr-5.5.1 with the following filters in my field analyzer definition:
<filter class="solr.MorfologikFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory"/>
It usually works great, but for some words there's a problem, for example with Poznań
. It's a city name, but the stemmer recognizes it as a polish noun with the base form poznanie
and that's what gets indexed. Now ASCII folding should make sure that when searching for poznan
, documents with poznań
will match. But poznan
is not recognized by stemmer as poznanie
, so there is not match.
Any ieas how to resolve this?
My idea for a workaround would be to make stemmer always retain the original token, so that poznań
turns into [poznań, poznanie]
instead of just [poznanie]
. Is there an easy way to achieve this? Is there a reason it doesn't work like this by default?
I didn't find anything about it in the javadoc for solr.MorfologikFilterFactory.