I'll skip the table and take the second approach -- only getting the counts. I think it's possible to walk your way backwards to the rows from the counts.
Note: I'll be using a human readable time format instead of timestamps but epoch_seconds
will work just as fine in your real use case. Also, I've added the comment
field to give each doc some background.
First, set up a your index:
PUT fleet
{
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"format": "epoch_second||yyyy-MM-dd HH:mm:ss"
},
"comment": {
"type": "text"
},
"deviceId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Sync a few docs -- I'm in UTC+2 so I chose these timestamps:
POST fleet/_doc
{
"deviceId": "asdjhfa343",
"timestamp": "2020-04-05 10:00:00",
"comment": "in the last week"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343",
"timestamp": "2020-04-10 13:05:00",
"comment": "#asdjhfa343 in the last hour"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343",
"timestamp": "2020-04-10 12:05:00",
"comment": "#asdjhfa343 in the 2 hours"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343sdas",
"timestamp": "2020-04-07 09:00:00",
"comment": "in the last week"
}
POST fleet/_doc
{
"deviceId": "asdjhfa343sdas",
"timestamp": "2020-04-10 12:35:00",
"comment": "in last 2hrs"
}
In total, we've got 5 docs and 2 distinct device ids w/ the following conditions
- all have appeared in the last 7d
- both of which in the last 2h and
- only one of which in the last hour
so I'm interested in finding precisely 1 deviceId
which has appeared in the last 2hrs BUT not last 1hr.
Using a combination of filter (for range filters), cardinality (for distinct counts) and bucket script (for count differences) aggregations.
GET fleet/_search
{
"size": 0,
"aggs": {
"distinct_devices_last7d": {
"filter": {
"range": {
"timestamp": {
"gte": "now-7d"
}
}
},
"aggs": {
"uniq_device_count": {
"cardinality": {
"field": "deviceId.keyword"
}
}
}
},
"not_seen_last1h": {
"filter": {
"range": {
"timestamp": {
"gte": "now-2h"
}
}
},
"aggs": {
"device_ids_per_hour": {
"date_histogram": {
"field": "timestamp",
"calendar_interval": "day",
"format": "'disregard' -- yyyy-MM-dd"
},
"aggs": {
"total_uniq_count": {
"cardinality": {
"field": "deviceId.keyword"
}
},
"in_last_hour": {
"filter": {
"range": {
"timestamp": {
"gte": "now-1h"
}
}
},
"aggs": {
"uniq_count": {
"cardinality": {
"field": "deviceId.keyword"
}
}
}
},
"uniq_difference": {
"bucket_script": {
"buckets_path": {
"in_last_1h": "in_last_hour>uniq_count",
"in_last2h": "total_uniq_count"
},
"script": "params.in_last2h - params.in_last_1h"
}
}
}
}
}
}
}
}
The date_histogram
aggregation is just a placeholder that enables us to use a bucket script
to get the final difference and not have to do any post-processing.
Since we passed size: 0
, we're not interested in the hits
section. So taking only the aggregations, here are the annotated results:
...
"aggregations" : {
"not_seen_last1h" : {
"doc_count" : 3,
"device_ids_per_hour" : {
"buckets" : [
{
"key_as_string" : "disregard -- 2020-04-10",
"key" : 1586476800000,
"doc_count" : 3, <-- 3 device messages in the last 2hrs
"total_uniq_count" : {
"value" : 2 <-- 2 distinct IDs
},
"in_last_hour" : {
"doc_count" : 1,
"uniq_count" : {
"value" : 1 <-- 1 distict ID in the last hour
}
},
"uniq_difference" : {
"value" : 1.0 <-- 1 == final result !
}
}
]
}
},
"distinct_devices_last7d" : {
"meta" : { },
"doc_count" : 5, <-- 5 device messages in the last 7d
"uniq_device_count" : {
"value" : 2 <-- 2 unique IDs
}
}
}