I have an elasticsearch cluster with an index with the following schema:
{
"my_index" : {
"mappings" : {
"test" : {
"properties" : {
"city" : {
"type" : "keyword"
},
"prober" : {
"type" : "keyword"
},
"status" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"test" : {
"type" : "keyword"
},
"time_taken" : {
"type" : "float"
},
"test_start" : {
"type" : "date"
},
"test_uuid" : {
"type" : "keyword"
},
"validity_start" : {
"type" : "date"
},
"validity_end" : {
"type" : "date"
}
}
}
}
}
}
Each test_uuid
is unique and each test has different params as laid out in the schema.
I am trying to filter/find all documents which, at a specific timestamp, fall under the window of validity_start
and validity_end
times.
My current query looks like this:
{
"query": {
"bool": {
"filter": [
{ "range": { "validity_start": { "lte": "<timestamp>" }}},
{ "range": { "validity_end": { "gte": "<timestamp>" }}}
]
}
}
}
I have two questions.
I want to group the returned results by
(test, city, prober)
. Within a given window of(validity_start, validity_end)
, there could be more than 1 document with the same(test,city,prober)
values. I want to filter out every document except the most recently executed test(which can be determined by thetest_start
time). Is there a way to do this in elasticsearch using aggs or any other feature?Once I have these results, I want to perform an aggregation across certain fields, based on the
status
field. For eg., if there are 100 results are spread across 2 tests,testA
andtestB
, I want to aggregatea. the number of results which have testA and are passing
b. number of results which have testA and are failing
c. number of results which have testB and are passing
d. number of results which have testB and are failing.
Similarly for cities and probers.