I'm trying to figure out how to configure the Solr manage-schema's fieldType to achieve the following:
(a) When searching for non-accented strings, the results will be accent insensitive.
(b) HOWEVER When performing searching on accented strings, the results will ONLY be accent sensitive.
For example:
searchString -> expectedResult
Equipe -> Equipe, Equipé, Equípé, etc...
Equipé -> Equipé
Note: Wildcard (*) is irrelevant and chosen words are for the sake of demonstration purposes only.
My situation is a little uncommon due to some requirement restrictions but with my schema (below), I have 3 fields; OName, OSearch, ONameSearch. (note: OSearch and ONameSearch serve different purposes in the backend, so they need to be defined indentically) The intention is for my Solr to query on OSearch and ONameSearch, and return the OName to UI.
My original understanding was that OName will store the original value ("María") and index it as accent-insensitive ("maria") such that when query without solr.ASCIIFoldingFilterFactory, the following would be achieved.
Example: {query} -> {OName = result}
q = OSearch:*equipe* OR ONameSearch:*equipe*
-> OName = Equipe, Equipé, Equípé, etc
q = OSearch:*equipé* OR ONameSearch:*equipé*
-> OName = Equipé
This is my schema so far...
<fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer>
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<field name="OName" type="lowercase" indexed="true" stored="true" />
<field name="OSearch" type="text_en_splitting_tight" indexed="true" stored="false" multiValued="true" />
<field name="ONameSearch" type="text_en_splitting_tight" indexed="true" stored="false" multiValued="true" />
<copyField source="OName" dest="OSearch" />
<copyField source="OName" dest="ONameSearch" />
Please advise, thanks!
Most if not all relevant resources I've looked into
How to ignore accent search in Solr
How to ignore accents in SOLR search?
SOLR and accented characters
Solr accent removal
SOLR Makes Search with Accented Characters Easy
Solr Ref Guide 6.6 Defining Fields
Solr Ref Guide 6.6 Copying Fields