Understand Elasticsearch Multivalue Fields

Question

I am trying to understand the position_increment_gap as it is explained on the Elasticsearch documentation https://www.elastic.co/guide/en/elasticsearch/guide/current/_multivalue_fields_2.html

I created the same index as in the example and inserted a single document

PUT /my_index/groups/1
{
    "names": [ "John Abraham", "Lincoln Smith", "Justin Trudeau"]
}

Then I try a phrase query for Abraham Lincoln and it matches, as expected

GET /my_index/groups/_search
{
    "query": {
        "match_phrase": {
            "names": "Abraham Lincoln"
        }
    }
}

{
  "took": 25,
  "timed_out": false,
  "_shards": {
  "total": 5,
  "successful": 5,
  "skipped": 0,
  "failed": 0
},
"hits": {
  "total": 1,
  "max_score": 0.5753642,
  "hits": [
  {
    "_index": "names",
    "_type": "doc",
    "_id": "1",
    "_score": 0.5753642,
    "_source": {
      "names": [
        "john abraham",
        "lincoln smith",
        "justin trudeau"
      ]
    }
  }
  ]
} 
}

The documentation explains that the match occurs because ES produces the tokens john abraham lincoln smith justin trudeau and it recommends inserting a position_increment_gap of 100 to avoid matching abraham lincoln unless I have a slop of 100.

I changed the index to have a position_increment_gap of 1 as shown below:

PUT names
{
  "mappings": {
    "doc": {
      "properties": {
        "names": {
          "type":"text",
          "position_increment_gap": 1
        }
      }
    }
  }
}

If I'm understanding the documentation, using a gap of 1 should allow me to match "abraham smith". But it doesn't match. Nor does "abraham lincoln", "abraham justin", or "abraham trudeau". "lincoln smith", "john abraham" and "justin trudeau" all continue to match.

I must be misunderstanding the documentation.

Thanks for any suggestions.

https://www.elastic.co/guide/en/elasticsearch/reference/current/position-increment-gap.html seems to use 0 for the gap so maybe 1 is too much? Also your link with a gap 100 produces 1, 2, 103, 104 so (logically) a gap of 1 would produce 1,2,4,5 — apokryfos, Mar 27 '18 at 20:55
Yes, a gap of 0 appears to be the default and allows "abraham lincoln" to match. I agree that a gap of 1 should produce 1,2,4,5 - so I would expect "abraham smith" to match but it did not. — user2434291, Mar 27 '18 at 22:12
It wouldn't match because there's a gap between abraham and smith and you're using a phrase match. — apokryfos, Mar 27 '18 at 22:26
The confusion appears to be twofold: 1) that the default position_increment_gap is 100 as per version 6.2 documentation, and 2) I was using an older version of Elasticsearch and the documentation I referenced above was for ES 2.x. — user2434291, Mar 28 '18 at 02:04

Understand Elasticsearch Multivalue Fields

0 Answers0