Elasticsearch: Aggregation For Random Fields

Question

Now I have a document like the picture. The Structure of this document is "contents" field with many random key field(Notice that there isn't a fixed format for keys.They may just be like UUIDs ). I want to find the maximum value of start_time for all keys in "contents" with ES query. What can I do for this? The document:

{"contents": {
    "key1": {
        "start_time": "2020-08-01T00:00:19.500Z",
        "last_event_published_time": "2020-08-01T23:59:03.738Z",
        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
        "size": 1590513,
        "read_offset": 1590513,
        "name": "key1_name"
    },
    "key2": {
        "start_time": "2020-08-01T00:00:19.500Z",
        "last_event_published_time": "2020-08-01T23:59:03.738Z",
        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
        "size": 1590513,
        "read_offset": 1590513,
        "name": "key2_name"
    }
}}

I have tried Joe's solution and it works. But when I modify the document like:

{
"timestamp": "2020-08-01T23:59:59.359Z",
"type": "beats_stats",
"beats_stats": {
    "metrics": {
        "filebeat": {
            "harvester": {
                "files": {
                    "d47f60db-ac59-4b51-a928-0772a815438a": {
                        "start_time": "2020-08-01T00:00:18.320Z",
                        "last_event_published_time": "2020-08-01T23:59:03.738Z",
                        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
                        "size": 1590513,
                        "read_offset": 1590513,
                        "name": "/data/logs/galogs/ga_log_2020-08-01.log"
                    },
                    "e47f60db-ac59-4b51-a928-0772a815438a": {
                        "start_time": "2020-08-01T00:00:19.500Z",
                        "last_event_published_time": "2020-08-01T23:59:03.738Z",
                        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
                        "size": 1590513,
                        "read_offset": 1590513,
                        "name": "/data/logs/galogs/ga_log_2020-08-01.log"
                    }
                }
            }
        }
    }
}}

It goes wrong:

"error" : {
"root_cause" : [
  {
    "type" : "script_exception",
    "reason" : "runtime error",
    "script_stack" : [
      "for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            ",
      "                                                                               ^---- HERE"
    ],
    "script" : "\n          for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            state.start_millis_arr.add(\n              Instant.parse(entry.start_time).toEpochMilli()\n            );\n          }\n        ",
    "lang" : "painless"
  }
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
  {
    "shard" : 0,
    "index" : "agg-test-index-1",
    "node" : "B4mXZVgrTe-MsAQKMVhHUQ",
    "reason" : {
      "type" : "script_exception",
      "reason" : "runtime error",
      "script_stack" : [
        "for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            ",
        "                                                                               ^---- HERE"
      ],
      "script" : "\n          for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            state.start_millis_arr.add(\n              Instant.parse(entry.start_time).toEpochMilli()\n            );\n          }\n        ",
      "lang" : "painless",
      "caused_by" : {
        "type" : "null_pointer_exception",
        "reason" : null
      }
    }
  }
]}

Can you please share the index mapping? What have you tried so far at your end? Rather than image link, can you please add the json, it would be easy for others to follow. — Sahil Gupta, Aug 06 '20 at 11:02
OKKK! This is the first time I publish a question. I try to use aggs and sort the start_time. But I find that I can't write the certain field for it (contents.*.start_time). And I have put the json doc. The original doc is far more complicated and I can't modify the mapping without enough authority.@SahilGupta — Selyx, Aug 07 '20 at 02:37

score 0 · Accepted Answer · answered Aug 06 '20 at 11:30

You can use a scripted_metric to calculate those. It's quite onerous but certainly possible.

Mimicking your index & syncing a few docs:

POST myindex/_doc
{"contents":{"randomKey1":{"start_time":"2020-08-06T11:01:00.515Z"}}}

POST myindex/_doc
{"contents":{"35431fsf31_s35dfas":{"start_time":"2021-08-06T11:01:00.515Z"}}}

POST myindex/_doc
{"contents":{"999bc_123":{"start_time":"2019-08-06T11:01:00.515Z"}}}

Get the max date of unknown random sub-objects:

GET myindex/_search
{
  "size": 0,
  "aggs": {
    "max_start_date": {
      "scripted_metric": {
        "init_script": "state.start_millis_arr = [];",
        "map_script": """
          for (def entry : params._source['contents'].values()) {
            state.start_millis_arr.add(
              Instant.parse(entry.start_time).toEpochMilli()
            );
          }
        """,
        "combine_script": """
          // sort in-place
          Collections.sort(state.start_millis_arr, Collections.reverseOrder());
          return DateTimeFormatter.ISO_INSTANT.format(
            Instant.ofEpochMilli(
              // first is now the highest
              state.start_millis_arr[0]
            )
          );

        """,
        "reduce_script": "return states"
      }
    }
  }
}

BTW: @Sahil Gupta's comment is right -- never use images where pasting the text is possible (and helpful).

Thanks Joe! I try to run your query on Kibana but it returns script_exception. Is it an environment problem? — Selyx, Aug 07 '20 at 03:20
Replace the current `params._source.beats_stats...` path with `params._source['beats_stats']['metrics']['filebeat']['harvester']['files']` — Joe - GMapsBook.com, Aug 07 '20 at 11:03

Elasticsearch: Aggregation For Random Fields

1 Answers1