Elasticsearch aggregation - support for GROUP BY

Question

I have an elasticsearch cluster with an index with the following schema:

{
  "my_index" : {
    "mappings" : {
      "test" : {
        "properties" : {
          "city" : {
            "type" : "keyword"
          },
          "prober" : {
            "type" : "keyword"
          },
          "status" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "test" : {
            "type" : "keyword"
          },
          "time_taken" : {
            "type" : "float"
          },
          "test_start" : {
            "type" : "date"
          },
          "test_uuid" : {
            "type" : "keyword"
          },
          "validity_start" : {
            "type" : "date"
          },
          "validity_end" : {
            "type" : "date"
          }
        }
      }
    }
  }
}

Each test_uuid is unique and each test has different params as laid out in the schema. I am trying to filter/find all documents which, at a specific timestamp, fall under the window of validity_start and validity_end times. My current query looks like this:

{
  "query": { 
    "bool": { 
      "filter": [ 
        { "range": { "validity_start": { "lte": "<timestamp>" }}},
        { "range": { "validity_end": { "gte": "<timestamp>" }}} 
      ]
    }
  }
}

I have two questions.

I want to group the returned results by (test, city, prober). Within a given window of (validity_start, validity_end), there could be more than 1 document with the same (test,city,prober) values. I want to filter out every document except the most recently executed test(which can be determined by the test_start time). Is there a way to do this in elasticsearch using aggs or any other feature?
Once I have these results, I want to perform an aggregation across certain fields, based on the status field. For eg., if there are 100 results are spread across 2 tests, testA and testB, I want to aggregate

a. the number of results which have testA and are passing

b. number of results which have testA and are failing

c. number of results which have testB and are passing

d. number of results which have testB and are failing.

Similarly for cities and probers.

score 0 · Answer 1 · answered Jan 25 '18 at 07:33

'Once I have these results' you will be done with that query. Elastic doesn't support join queries. You can do aggregation on client side or run a new query.
For your first question: what I always do is make a visualization in Kibana (with subbucket term aggregation in your case) and look at the query it sends (you can get that using the small arrow icon on the bottom of the visualization). It might not always be 100% accurate but it gives a start.

Elasticsearch aggregation - support for GROUP BY

1 Answers1