ElasticSearch/Kibana: get values that are not found in entries more recent than a certain date

Question

I have a fleet of devices that push to ElasticSearch at regular intervals (let's say every 10 minutes) entries of this form:

{
    "deviceId": "unique-device-id",
    "timestamp": 1586390031,
    "payload" : { various data }
}

I usually look at this through Kibana by filtering for the last 7 days of data and then drilling down by device id or some other piece of data from the payload.

Now I'm trying to get a sense of the health of this fleet by finding devices that haven't reported anything in the last hour let's say. I've been messing around with all sorts of filters and visualisations and the closest I got to this is a data table with device ids and the timestamp of the last entry for each, sorted by timestamp. This is useful but is somewhat hard to work with as I have a few thousand devices.

What I dream of is getting either the above mentioned table to contain only the device ids that have not reported in the last hour, or getting only two numbers: the total count of distinct device ids seen in the last 7 days and the total count of device ids not seen in the last hour.

Can you point me in the right direction, if any one of these is even possible?

Joe - GMapsBook.com · Accepted Answer · 2020-04-10T14:32:55.517

I'll skip the table and take the second approach -- only getting the counts. I think it's possible to walk your way backwards to the rows from the counts.

Note: I'll be using a human readable time format instead of timestamps but epoch_seconds will work just as fine in your real use case. Also, I've added the comment field to give each doc some background.

First, set up a your index:

PUT fleet
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date",
        "format": "epoch_second||yyyy-MM-dd HH:mm:ss"
      },
      "comment": {
        "type": "text"
      },
      "deviceId": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

Sync a few docs -- I'm in UTC+2 so I chose these timestamps:

POST fleet/_doc
{
  "deviceId": "asdjhfa343",
  "timestamp": "2020-04-05 10:00:00",
  "comment": "in the last week"
}

POST fleet/_doc
{
  "deviceId": "asdjhfa343",
  "timestamp": "2020-04-10 13:05:00",
  "comment": "#asdjhfa343 in the last hour"
}

POST fleet/_doc
{
  "deviceId": "asdjhfa343",
  "timestamp": "2020-04-10 12:05:00",
  "comment": "#asdjhfa343 in the 2 hours"
}

POST fleet/_doc
{
  "deviceId": "asdjhfa343sdas",
  "timestamp": "2020-04-07 09:00:00",
  "comment": "in the last week"
}

POST fleet/_doc
{
  "deviceId": "asdjhfa343sdas",
  "timestamp": "2020-04-10 12:35:00",
  "comment": "in last 2hrs"
}

In total, we've got 5 docs and 2 distinct device ids w/ the following conditions

all have appeared in the last 7d
both of which in the last 2h and
only one of which in the last hour

so I'm interested in finding precisely 1 deviceId which has appeared in the last 2hrs BUT not last 1hr.

Using a combination of filter (for range filters), cardinality (for distinct counts) and bucket script (for count differences) aggregations.

GET fleet/_search
{
  "size": 0,
  "aggs": {
    "distinct_devices_last7d": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "now-7d"
          }
        }
      },
      "aggs": {
        "uniq_device_count": {
          "cardinality": {
            "field": "deviceId.keyword"
          }
        }
      }
    },
    "not_seen_last1h": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "now-2h"
          }
        }
      },
      "aggs": {
        "device_ids_per_hour": {
          "date_histogram": {
            "field": "timestamp",
            "calendar_interval": "day",
            "format": "'disregard' -- yyyy-MM-dd"
          },
          "aggs": {
            "total_uniq_count": {
              "cardinality": {
                "field": "deviceId.keyword"
              }
            },
            "in_last_hour": {
              "filter": {
                "range": {
                  "timestamp": {
                    "gte": "now-1h"
                  }
                }
              },
              "aggs": {
                "uniq_count": {
                  "cardinality": {
                    "field": "deviceId.keyword"
                  }
                }
              }
            },
            "uniq_difference": {
              "bucket_script": {
                "buckets_path": {
                  "in_last_1h": "in_last_hour>uniq_count",
                  "in_last2h": "total_uniq_count"
                },
                "script": "params.in_last2h - params.in_last_1h"
              }
            }
          }
        }
      }
    }
  }
}

The date_histogram aggregation is just a placeholder that enables us to use a bucket script to get the final difference and not have to do any post-processing.

Since we passed size: 0, we're not interested in the hits section. So taking only the aggregations, here are the annotated results:

...
"aggregations" : {
    "not_seen_last1h" : {
      "doc_count" : 3,
      "device_ids_per_hour" : {
        "buckets" : [
          {
            "key_as_string" : "disregard -- 2020-04-10",
            "key" : 1586476800000,
            "doc_count" : 3,            <-- 3 device messages in the last 2hrs
            "total_uniq_count" : {
              "value" : 2               <-- 2 distinct IDs
            },
            "in_last_hour" : {
              "doc_count" : 1,
              "uniq_count" : {
                "value" : 1             <-- 1 distict ID in the last hour
              }
            },
            "uniq_difference" : {
              "value" : 1.0             <-- 1 == final result !
            }
          }
        ]
      }
    },
    "distinct_devices_last7d" : {
      "meta" : { },
      "doc_count" : 5,                  <-- 5 device messages in the last 7d
      "uniq_device_count" : {
        "value" : 2                     <-- 2 unique IDs
      }
    }
  }

ElasticSearch/Kibana: get values that are not found in entries more recent than a certain date

1 Answers1

Linked