1

In a python request to an Elasticsearch 1.7 index, I wish the API to return only the requested ID's in a list and else return null or nothing in the field containing the list. The rest of the response should remain the same, in other words, be independent of this 'filter'.

With the request below, the index seems to return all ID's in the hit/doc (in hits.hits._source.foo.bar) in case of a match on one ID in the list.

python request example

ID_list = ["1", "2", "3", "4"] # example to illustrate a list of real ID's to be requested
payload = {
       "_source": {
            "include": ["jada", "foo.bar"]
       },
       "query": {
            "bool": {
                "must": [
                      {"terms": {"foo.bar": ID_list}}, # requesting the specific ID's
                      {"match": {"something.something-else": "this"}},
                      {"match": {"some.more": "that"}}],
       "sort": ["_doc"]
}

response example

"hits":{
        "hits":[
                {"_source":
                      {"jada":"bla1",
                       "foo":[{"bar":"1"},
                              {"bar":"2"},
                              {"bar":"99"}]}}, # wish to be null or not returned
                {"_source":
                      {"jada":"bla2",
                       "foo":[{"bar":"2"},
                              {"bar":"99"}]}}, # wish to be null or not returned
                {"_source":
                      {"jada":"bla3",
                       "foo":[{"bar":"3"},
                              {"bar":"98"}, # wish to be null or not returned
                              {"bar":"1"},
                              {"bar":"99"}]}}, # wish to be null or not returned
]}

json in index example

"hits":{
        "hits":[
                {"_source":
                      {"jada":"bla1",
                       "foo":[{"bar":"1",
                               "time":"time1"},
                              {"bar":"2",
                               "time":"time2"},
                              {"bar":"99",
                               "time":"time3"}]}},
                {"_source":
                      {"jada":"bla2",
                       "foo":[{"bar":"2",
                               "time":"time4"},
                              {"bar":"99",
                               "time":"time5"}]}},
                {"_source":
                      {"jada":"bla3",
                       "foo":[{"bar":"3",
                               "time":"time6"},
                              {"bar":"98",
                               "time":"time7"},
                              {"bar":"1",
                               "time":"time8"},
                              {"bar":"99",
                               "time":"time9"}]}},
                {"_source":
                      {"jada":"bla4",
                       "foo":[{"bar":"98",
                               "time":"time10"},
                              {"bar":"99",
                               "time":"time11"}]}},
]}

My preliminary request source filters to get only certain fields and match and terms queries to get only certain values of fields taking only one value.

Here, I am asking how to request only certain values of fields that may contain multiple values.

jaspreet chahal, Eli, Joe - ElasticsearchBook.com, and binariedMe suggest nested inner hit or script queries to some seemingly related problems. Is this correct? How?

Thanks!

edit: following advice to query nested inner hits

payload = {
       "_source": {
            "include": ["jada"]
       },
       "query": {
            "bool": {
                "must": [
            {"match": {"something.something-else": "this"}},
            {"match": {"some.more": "that"}},
            {"nested": {
                       "path": "foo",
                       "query": {
                            "bool": {
                                "must": [
                                        {"terms": {"foo.bar": ID_list}}]}},
                       "inner_hits": {"_source": ["foo.bar"]}}}],
       "sort": ["_doc"]}

returns: nested: QueryParsingException[[index-name-here] [nested] nested object under path [foo] is not of nested type]; }]","status":400}'

Johan
  • 186
  • 15

1 Answers1

0

Elasticsearch has no concept of inner objects. Therefore, it flattens object hierarchies into a simple list of field names and values

If field foo is object type , it will be stored internally as below.

{
  foo.bar: ["1","2",...],
  foo.xyz:["a","b",..]
}

So when you run term or any other query against foo.bar, it returns all documents where this field is matched.

You can choose which fields are returned (using source) not which values are returned.

Nested Field

The nested type is a specialised version of the object data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other.

In nested type each object (bar:1) becomes a separate document. That is why you are able to filter out which sub-document is returned using inner_hits

jaspreet chahal
  • 8,817
  • 2
  • 11
  • 29
  • Unfortunately, the API returns `nested: QueryParsingException[[index-name-here] [nested] nested object under path [bar] is not of nested type]; }]","status":400}'`. – Johan Aug 02 '22 at 08:27
  • @JohanAndresen field datatype are defined while creating mapping. To filter out such data at elastic search end, you will need to recreate index with field defined as nested. If you cannot change the mapping, then filtering needs to be handled at client end – jaspreet chahal Aug 02 '22 at 11:34
  • 1
    Thanks jaspreet chahal. Your comment answers my question. – Johan Aug 17 '22 at 10:41