Jest provides a brilliant async API for elasticsearch, we find it very usefull. However, sometimes it turns out that resulting requests are slightly different than what we would expect.
Usually we didn't care, since everything was working fine, but in this case it was not.
I want to create an index with a custom ngram analyzer. When I do this following the elasticsearch rest API docs, I call below:
curl -XPUT 'localhost:9200/test' --data '
{
"settings": {
"number_of_shards": 3,
"analysis": {
"filter": {
"keyword_search": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 15
}
},
"analyzer": {
"keyword": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"keyword_search"
]
}
}
}
}
}'
and then I confirm the analyzer is configured properly using:
curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens
in response I receive multiple tokens like exp, expe, expec and so on.
Now using Jest client I put the config json to a file on my classpath, the content is exactly the same as the body of the PUT request above. I execute the Jest action constructed like this:
new CreateIndex.Builder(name)
.settings(
ImmutableSettings.builder()
.loadFromClasspath(
"settings.json"
).build().getAsMap()
).build();
In result
Primo - checked with tcpdump that what's actually posted to elasticsearch is (pretty printed):
{ "settings.analysis.filter.keyword_search.max_gram": "15", "settings.analysis.filter.keyword_search.min_gram": "3", "settings.analysis.analyzer.keyword.tokenizer": "whitespace", "settings.analysis.filter.keyword_search.type": "edge_ngram", "settings.number_of_shards": "3", "settings.analysis.analyzer.keyword.filter.0": "lowercase", "settings.analysis.analyzer.keyword.filter.1": "keyword_search", "settings.analysis.analyzer.keyword.type": "custom" }
Secundo - the resulting index settings is:
{ "test": { "settings": { "index": { "settings": { "analysis": { "filter": { "keyword_search": { "type": "edge_ngram", "min_gram": "3", "max_gram": "15" } }, "analyzer": { "keyword": { "filter": [ "lowercase", "keyword_search" ], "type": "custom", "tokenizer": "whitespace" } } }, "number_of_shards": "3" <-- the only difference from the one created with rest call }, "number_of_shards": "3", "number_of_replicas": "0", "version": {"created": "1030499"}, "uuid": "Glqf6FMuTWG5EH2jarVRWA" } } } }
Tertio - checking the analyzer with
curl -XGET 'localhost:9200/test/_analyze?analyzer=keyword&text=Expecting many tokens
I get just one token!
Question 1. What is the reason that Jest does not post my original settings json, but some processed one instead?
Question 2. Why the settings generated by Jest are not working?