1

Documents Structure

Doc_1 {
"title":"hello",
"myObject":{
 "key1":"value1",
 "key2":"value2"
 }
}
Doc_2 {
"title":"hello world",
"myObject":{
 "key2":"value4",
 "key3":"value3"
 }
}
Doc_3 {
"title":"hello world2",
"myObject":{
 "key1":"value1",
 "key3":"value3"
 }
}

Information: myObject contains dynamic key-value pair.

Objective: My objective is to write an aggregation query to return the number of unique all dynamic key-value pairs.

Attempt and explanation: I can easily get results for known keys in this way.

       {
        "size":0,
        "query":{
               "match":{"title":"hello"}
                },
        "aggs":{
               "key1Agg":{
                    "terms":{"field":"myObject.key1.keyword"}
                },
               "key2Agg":{
                    "terms":{"field":"myObject.key2.keyword"}
                },
               "key3Agg":{
                    "terms":{"field":"myObject.key3.keyword"}
               }
              }
          }

This is the typical result of the above hardcoded nested keys aggregation.

{
...
"aggregations": {
    "key1Agg": {
        ...        
        "buckets": [
            {
                "key": "value1",
                "doc_count": 2
            }

        ]
    },
    "key2Agg": {
        ...
        "buckets": [
            {
                "key": "value2",
                "doc_count": 1
            },
            {
                "key": "value4",
                "doc_count": 1
            }

        ]
    },
    "key3Agg": {
       ...
        "buckets": [
            {
                "key": "value3",
                "doc_count": 2
            }

        ]
    }
}

}

Now all I want is to return the count of all dynamic key-value pairs, i.e without putting any hardcore key names in an aggregation query.

I am using ES 6.3, Thanks in Advance!!

nikhil024
  • 435
  • 5
  • 19

1 Answers1

2

From the information you have provided, it appears that myObject seems to be of object datatype and not nested datatype.

Well, there is no easy way to do without modifying the data you have, what you can do and possibly the simplest solution would be is to include an additional field say let's call it as myObject_list which would be of type keyword where the documents would be as follows:

Sample Documents:

POST test_index/_doc/1
{
 "title":"hello",
  "myObject":{
   "key1":"value1",
   "key2":"value2"
  },
  "myObject_list": ["key1_value1", "key2_value2"]     <--- Note this
}

POST test_index/_doc/2
{
 "title":"hello world",
  "myObject":{
   "key2":"value4",
   "key3":"value3"
  },
  "myObject_list": ["key2_value4", "key3_value3"]     <--- Note this
}

POST test_index/_doc/3
{
 "title":"hello world2",
  "myObject":{
   "key1":"value1",
   "key3":"value3"
  },
  "myObject_list": ["key1_value1", "key3_value3"]     <--- Note this
}

You can have a query as simple as below:

Request Query:

POST test_index/_search
{
  "size": 0,
  "aggs": {
    "key_value_aggregation": {
      "terms": {
        "field": "myObject_list",              <--- Make sure this is of keyword type
        "size": 10
      }
    }
  }
}

Note that I've used Terms Aggregation over here.

Response:

{
  "took" : 406,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "key_value_aggregation" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "key1_value1",
          "doc_count" : 2
        },
        {
          "key" : "key3_value3",
          "doc_count" : 2
        },
        {
          "key" : "key2_value2",
          "doc_count" : 1
        },
        {
          "key" : "key2_value4",
          "doc_count" : 1
        }
      ]
    }
  }
}

Hope this helps!

Kamal Kunjapur
  • 8,547
  • 2
  • 22
  • 32
  • Thanks, @kamal, I have updated the title question to object type instead of nested type. Does it mean there is no other way to achieve this without adding a corresponding new field of type keyword? – nikhil024 Oct 09 '19 at 18:10
  • 1
    Well I think it can be done via painless script but I'm not sure about this (I know this could be done for `nested` type but not sure on `object` type). Would need to evaluate and see. Let me see if I can spend sometime this weekend on this and let you know. Hope that would be okay!! FYI, if you want to try refer to this answer and see if it helps. https://stackoverflow.com/a/53192844/3838328 – Kamal Kunjapur Oct 09 '19 at 18:49
  • 1
    Also more importantly, painless scripts are not recommended as it may hit performance and even more so when it comes to high computations or simple calculations on large data set. Regardless, the optimum solution would be the one I've posted as an answer. – Kamal Kunjapur Oct 09 '19 at 18:51