I am trying to understand the position_increment_gap as it is explained on the Elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/guide/current/_multivalue_fields_2.html
I created the same index as in the example and inserted a single document
PUT /my_index/groups/1
{
"names": [ "John Abraham", "Lincoln Smith", "Justin Trudeau"]
}
Then I try a phrase query for Abraham Lincoln and it matches, as expected
GET /my_index/groups/_search
{
"query": {
"match_phrase": {
"names": "Abraham Lincoln"
}
}
}
{
"took": 25,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{
"_index": "names",
"_type": "doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"names": [
"john abraham",
"lincoln smith",
"justin trudeau"
]
}
}
]
}
}
The documentation explains that the match occurs because ES produces the tokens john abraham lincoln smith justin trudeau and it recommends inserting a position_increment_gap of 100 to avoid matching abraham lincoln unless I have a slop of 100.
I changed the index to have a position_increment_gap of 1 as shown below:
PUT names
{
"mappings": {
"doc": {
"properties": {
"names": {
"type":"text",
"position_increment_gap": 1
}
}
}
}
}
If I'm understanding the documentation, using a gap of 1 should allow me to match "abraham smith". But it doesn't match. Nor does "abraham lincoln", "abraham justin", or "abraham trudeau". "lincoln smith", "john abraham" and "justin trudeau" all continue to match.
I must be misunderstanding the documentation.
Thanks for any suggestions.