0

I have a field named tags in my document in elasticsearch with the following structure.

tags = [
    {
        "id": 10,
        "related": [9, 8, 7]
    }
]

I now run a filter with a list. e.g. [10, 9]. I want to filter only those documents which contain all the items in the list either in id or in related. If I search with [9, 8], the above document should be returned. If I search with [9, 12], the above document shouldn't be returned as 12 isn't present in either id or related.

I tried with terms filter but it simply does or. Is there any technique that can be implemented to achieve the above goal.

Further, I would like to provide a higher ranking to document which contain the given items in id compared to those which contain given items in related.

Sudip Kafle
  • 4,286
  • 5
  • 36
  • 49

1 Answers1

2

Problem Analysis

Let's break your problem in the following subproblems:

  • (P1) Check whether all the terms provided in the array are present in either tags.id or tags.related. This can be further decomposed into:
    • (P1.1) Check whether all the terms provided in the array are present in a field
    • (P1.2) Check whether all the terms provided in the array are spread across different fields
  • (P2) Assign a higher score to those documents having any of the provided terms as tags.id

Solution

To solve (P1.1), you can use the terms_set query, available in Elasticsearch v6.6 (see documentation).

To solve (P1.2), I'd copy all the values of tags.id and tags.related into a new custom field, named, e.g., tags.all. This can be achieved using the copy_to property as follows:

{
  "mappings": {
    "_doc": {
      "properties": {
        "tags": {
          "properties": {
            "id": { 
              "type": "long",
              "copy_to": "tags.all"
            },
            "related": { 
              "type": "long",
              "copy_to": "tags.all"
            }
          }
        }
      }
    }
  }
}

Then, to solve (P1), you can run your terms_set query against tags.all. E.g.,

{
  "query": {
    "terms_set": {
      "tags.all": {
        "terms": [ 9, 8 ],
        "minimum_should_match_script": {
          "source": "2"
        }
      }
    }
  }
}

Finally, to solve (P2), you can create a boolean should query that includes (i) the terms_set query described above, (ii) a terms query against tags.id only, which has a higher boost factor. I.e.,

{
  "query": {
    "bool": {
      "should": [
        {
          "terms_set": {
            "tags.all": {
              "terms": [ 9, 8 ],
              "minimum_should_match_script": {
                "source": "2"
              }
            }
          }
        },
        {
          "terms": {
            "tags.id": {
              "value": [ 9, 8 ],
              "boost": 2
            }
          }
        }
      ]
    }
  }
}
glenacota
  • 2,314
  • 1
  • 11
  • 18