We use Solr 5.4 and have some text fields defined as text_de
with following schema.xml
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="lang/stopwords_de.txt" format="snowball" ignoreCase="true"/>
<filter class="solr.GermanNormalizationFilterFactory"/>
<filter class="solr.GermanLightStemFilterFactory"/>
</analyzer>
</fieldType>
which is default configuration. I wonder why a search for name:Rosewein
has no results, but name:Roséwein
returns related entries.
So I tried to query field name
with some special chars and enabled option debugQuery
which results in:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"debugQuery": "true",
"indent": "true",
"q": "name:ÁÀÂÄÃåĀĂÆæöüßéèêíóú",
"_": "1459935371889",
"wt": "json"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
},
"debug": {
"rawquerystring": "name:ÁÀÂÄÃåĀĂÆæöüßéèêíóú",
"querystring": "name:ÁÀÂÄÃåĀĂÆæöüßéèêíóú",
"parsedquery": "name:aaaaãåāăææousséèêiou",
"parsedquery_toString": "name:aaaaãåāăææousséèêiou",
"explain": {},
"QParser": "LuceneQParser",
...
have a look at field parsedquery
which shows, that not all variants are replaced with ASCII representation. I cannot use ASCIIFoldingFilterFactory
as filter, because then german umlauts can get lost, because in some cases they are converted from ü
to ue
and so on.
But what I can't understand: why are íóúá
converted to ioua
but not é
which is kept as é
?
And: is there a way to convert all these special vocals to their ASCII representation, but allow to be umlauts converted to ae Ae ue Ue
and so on? (Without having to recompile Solr)