In the middle of the work to migrate an elasticsearch index from 2.3.3 to elasticsearch 5.1.1 we have noticed that the creation of the index has risen up from less than 20 seconds to 17 minutes. This config is from development environment in a Vagrant box.
An overview of the settings & mappings would be like this:
{
"index": "your_index_name",
"settings": {
"index.requests.cache.enable": true,
"index.unassigned.node_left.delayed_timeout": "5m",
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"char_filter": {
#YOUR CHAR FILTERS
},
"analyzer": {
"your_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"char_filter": ["custom_pattern_1", "custom_pattern_2", "custom_pattern_3"],
"filter": ["lowercase", "massive_synonym_list_filter", "long_synonym_list_filter"]
}
},
"filter": {
"long_synonym_list_filter": {
"type": "keep",
"keep_words": ["list-of-25k-words"]
"keep_words_case": false
},
"massive_synonym_list_filter": {
"tokenizer": "keyword",
"type": "synonym",
"synonyms": ["list-of-40k-synonyms"]
}
}
}
},
"mappings": {
...
#YOUR MAPPING
...
}
}
If we remove the massive synonyms list out of the equation, the index gets created very very quickly but unfortunately we do need the synonym lists we were benefiting of in ES 2.3.3.
I'm using the official latest Elasticsearch php client provided by elastic and I'm not using a file to store the words and synonyms but adding them in an array within the analysis filter settings.
Many thanks!
EDIT
here can be see a hot threads dump when the CPU is at 200% and creating the index http://pastebin.com/5tJJNGBC
UPDATE more info can be seen on ES forums https://discuss.elastic.co/t/200-cpu-elasticsearch-5-index-creation-very-slow-with-a-huge-synonyms-list/69052