0

I am a newbie to Elastic Search and I am trying to find out how to handle the scenario briefed here. I am having a schema where a document may contain data such as

{ 
   "country":"US",
   "zone": "East",
   "cluster": "Cluster1",
   "time_taken": 4500,
   "status": 0
},
{ 
   "country":"US",
   "zone": "East",
   "cluster": "Cluster1",
   "time_taken": 5000,
   "status": 0
},
{ 
   "country":"US",
   "zone": "East",
   "cluster": "Cluster1",
   "time_taken": 5000,
   "status": 1
},
{ 
   "country":"US",
   "zone": "East",
   "cluster": "Cluster2",
   "time_taken": 5000,
   "status": 0
}

Where status = 0 for success, 1 for failure

I would want to show a result in a way that it can reflect a hierarchy with values from "success" like

  • US/East/Cluster1 = 66% (which is basically 2 success and 1 failure)
  • US/East/Cluster2 = 100% (which is basically 1 success)
  • US/East = 75%
  • US = 75%

Alternatively, if there is also a way to get the time taken average for success and failure scenarios spread across this hierarchy like denoted above, would be great.

MatterixN
  • 1
  • 2

1 Answers1

0

I think a terms aggregation should get the job done for you.

In order to satisfy your first query examples (% success per cluster), try something like this:

{
  "aggs": {
    "byCluster": {
      "terms": {
        "field": "cluster"
      },
      "aggs": {
        "success_or_fail": {
          "terms": {
            "field": "status"
          }
        }
      }
    }
  }
}

This returns a result that looks something like this:

"aggregations": {
      "byCluster": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "cluster1",
               "doc_count": 3,
               "success_or_fail": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                     {
                        "key": 0,
                        "doc_count": 2
                     },
                     {
                        "key": 1,
                        "doc_count": 1
                     }
                  ]
               }
            },
            {
               "key": "cluster2",
               "doc_count": 1,
               "success_or_fail": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                     {
                        "key": 0,
                        "doc_count": 1
                     }
                  ]
               }
            }
         ]
      }
   }

You can take the doc_count for the 0 bucket of the "success_or_fail" (arbitrary name) aggregation and divide it by the doc_count for the corresponding cluster. This will give you the % success for each cluster. (2/3 for "cluster1" and 1/1 for "cluster2").

The same type of aggregation could be used to group by "country" and "zone".

UPDATE

You can also nest a avg aggregation next to the "success_or_fail" terms aggregation, in order to achieve the average time taken you were looking for.

As in:

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "byCluster": {
      "terms": {
        "field": "cluster"
      },
      "aggs": {
        "success_or_fail": {
          "terms": {
            "field": "status"
          },
          "aggs": {
            "avg_time_taken": {
              "avg": {
                "field": "time_taken"
              }
            }
          }
        }
      }
    }
  }
}
BrookeB
  • 1,709
  • 13
  • 22