Consider the following mappings as an example:
PUT /test
{
"settings": {
"analysis": {
"filter": {
"my_hunspell": {
"type": "hunspell",
"language": "en_GB"
}
},
"analyzer": {
"my_test": {
"type" : "custom",
"tokenizer": "lowercase",
"filter": ["my_hunspell"]
}
}
}
}
}
I've downloaded hunspell dictionaries from official Mozilla page.
Now the issue is that some words, for instance beer are over-analyzed. Following query transforms beer into bee, which is not entirely correct?
POST /test/_analyze?analyzer=my_test&text=beer
{
"tokens": [
{
"token": "bee",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 1
}
]
}
Hunspell syntax is quite hard to understand. What can be done to avoid such a behaviour? Is it possible preserve some words or to add some rule?