I'm using:
- Haystack - 2.1.0
- ElasticSearch - 0.90.3
- pyelasticsearch - 0.6
I've configured a custom backend to change default Elasticsearch settings and use Spanish analyzer.
I'm using this settings for Elasticsearch:
"settings" : {
"index": {
"uuid": "IPwcMthwRpSJzpjtarc9eQ",
"analysis": {
"analyzer": {
"default": {
"filter": ["standard", "lowercase", "asciifolding", ],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"number_of_shards": "10",
}
},
"analyzer": {
"spanish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stop",
"spanish_keywords",
"spanish_stemmer"
]
}
}
I read this settings in some answer here. When I apply this settings to ElasticSearch and reindex my models I get a behaviour that I'm not sure I understand.
I have some objects with names like "Ciencias" and others like "Ciéncies" When I do a search like "ciencias" I receive objects with names like "Ciencias" and "Ciéncies", and the same happens when I search for "ciencies" or "ciéncies".
I want ElasticSearch to ignore accents, that's why I'm using asciifolding, and using spanish tokenizer because most of text is in spanish. I don't understand why using different words like "cienciAs" and "cienciEs" receive same results.
Why is this happening ? Is because a default ngram analyzer that is splitting the words ?
Why searching for "cienciAs" I get object with name like "ciénciEs" as results ?