0

Sample of the data present in my ES index :

{
  "entities" : [
    {
      "fieldName" : "abc"
    },
    {
      "fieldName" : "def"
    }
  ], 
  "entities" : [
    {
      "fieldName" : "abc"
    },
    {
      "fieldName" : "def"
    }
  ],
 "entities" : [
    {
      "fieldName" : "abc"
    },
    {
      "fieldName" : "def"
    },
    {
      "fieldName" : "gh"
    }
  ]
}

I would like to find only those documents where the fieldName only matches "abc" and "def", so I have tried a nested match query of ES but the problem is it is also matching documents having extra fields apart from "abc" and "def".

GET fetch_latest_version/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "entities",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "entities.fieldName": "abc"
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "entities",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "entities.fieldName": "def"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }

The result of the above query is it will list all the 3 documents present in sample data. I would only want it to match the 1st and 2nd document only(like an exact match) where entities list has only "abc" and "def" fields. I shouldn't match the 3rd document where fieldName are ("abc","def","gh") .

2 Answers2

0

You can add a must_not query, to remove all the documents having"entities.fieldName":"gh". Modify your search query as shown below

    {
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "entities",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "entities.fieldName": "abc"
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "entities",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "entities.fieldName": "def"
                    }
                  }
                ]
              }
            }
          }
        }
      ],
      "must_not": {
        "nested": {
          "path": "entities",
          "query": {
            "bool": {
              "should": [
                {
                  "match": {
                    "entities.fieldName": "gh"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

Search Result will be

"hits": [
      {
        "_index": "66381155",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.8266786,
        "_source": {
          "entities": [
            {
              "fieldName": "abc"
            },
            {
              "fieldName": "def"
            }
          ]
        }
      },
      {
        "_index": "66381155",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.8266786,
        "_source": {
          "entities": [
            {
              "fieldName": "abc"
            },
            {
              "fieldName": "def"
            }
          ]
        }
      }
    ]

Update 1:

AFAIK there is no way to tell elasticsearch to ensure that EVERY document must have "fieldName":"abc" AND "fieldName":"def". One way could be to check for this condition, before indexing the documents. You can create a top-level, non-nested field, containing that value.

ESCoder
  • 15,431
  • 2
  • 19
  • 42
  • This is just a sample where the extra field "gh" is present in the data there might be any number of extra fields whose value is unknown to me. Also, let's say a document just has a field "abc" alone that also shouldn't match. Only fields that will be specified should match exactly. – Sheikh Mannan Sohail Feb 26 '21 at 06:59
  • @SheikhMannanSohail please go through the updated query and updated part of the answer, and let me know if this resolves your issue ? – ESCoder Feb 26 '21 at 08:09
  • I checked the updated query but as I told we cannot have a must_not because we won't know the values that might exist in fieldName apart from "gh". Also, there can be any number of extra unwanted fieldName like "ij", "kl" anything. The update1 is quite unclear to me. Can you please elaborate? – Sheikh Mannan Sohail Feb 26 '21 at 09:17
0

I dont think there is a exact children match query, so I would recommend to create a single field with the combination as a key : ie : "abcdef" and then do an exact search against that one.

If you need to solve it without touching the data then you can use a script query :

GET /fetch_latest_version/_search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": {
            "script": {
              "script": {
                "lang": "painless",
                "source": "def x = doc['entities.fieldName.keyword']; x.containsAll(params.filter) && params.filter.length == x.length",
                "params": {
                  "filter": [
                    "def",
                    "abc"
                  ]
                }
              }
            }
          }
        }
      }
    }
  }
}

I had to remove the nested field type to painless to work

I first validate that both "abc" AND "def" are present in the document, and then I validate the length of the filters and the children are the same.

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "fetch_latest_version",
        "_type" : "_doc",
        "_id" : "xf733XcBhQfuY9se1B5L",
        "_score" : 1.0,
        "_source" : {
          "entities" : [
            {
              "fieldName" : "abc"
            },
            {
              "fieldName" : "def"
            }
          ]
        }
      },
      {
        "_index" : "fetch_latest_version",
        "_type" : "_doc",
        "_id" : "xv743XcBhQfuY9seFR6t",
        "_score" : 1.0,
        "_source" : {
          "entities" : [
            {
              "fieldName" : "abc"
            },
            {
              "fieldName" : "def"
            }
          ]
        }
      }
    ]
  }
}
llermaly
  • 2,331
  • 2
  • 16
  • 29