Elasticsearch summing buckets

Question

I have the following request which will return the count of all documents with a status of either "Accepted","Released" or closed.

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*",
            "analyze_wildcard": true
          }
        }
      ],
      "must_not": []
    }
  },
  "aggs": {
    "slices": {
      "terms": {
        "field": "status.raw",
        "include": {
          "pattern": "Accepted|Released|Closed"
        }
      }
    }
  }
}

In my case the response is:

 "buckets": [
        {
          "key": "Closed",
          "doc_count": 2216
        },
        {
          "key": "Accepted",
          "doc_count": 8
        },
        {
          "key": "Released",
          "doc_count": 6
        }
      ]

Now I'd like to add all of them up into a single field. I tried using pipeline aggregations and even tried the following sum_bucket (which apparently only works on multi-bucket):

"total":{
    "sum_bucket":{
        "buckets_path": "slices"
    }
}

Anyone able to help me out with this?

Maybe a dumb question but why not simply querying on `status.raw:(Accepted OR Released OR Closed)` and then simply checking the total hits? — Val, Nov 03 '16 at 13:34
I need the buckets separate as well. I need the following: closed,accepted,released,total — Rick van Lieshout, Nov 03 '16 at 13:52

score 4 · Answer 1 · answered Nov 03 '16 at 14:04

4

With sum_bucket and your already existent aggregation:

  "aggs": {
    "slices": {
      "terms": {
        "field": "status.raw",
        "include": {
          "pattern": "Accepted|Released|Closed"
        }
      }
    },
    "sum_total": {
      "sum_bucket": {
        "buckets_path": "slices._count"
      }
    }
  }

answered Nov 03 '16 at 14:04

Andrei Stefan

51,654
6
98
89

Thanks, I accepted this answer because it's very close to my original query. I just missed ._count on my bucket path :) – Rick van Lieshout Nov 03 '16 at 14:06

score 1 · Accepted Answer · answered Nov 03 '16 at 14:01

1

What I would do is to use the filters aggregation instead and define all the buckets you need, like this:

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*",
            "analyze_wildcard": true
          }
        }
      ],
      "must_not": []
    }
  },
  "aggs": {
    "slices": {
      "filters": {
        "filters": {
          "accepted": {
            "term": {
              "status.raw": "Accepted"
            }
          },
          "released": {
            "term": {
              "status.raw": "Released"
            }
          },
          "closed": {
            "term": {
              "status.raw": "Closed"
            }
          },
          "total": {
            "terms": {
              "status.raw": [
                "Accepted",
                "Released",
                "Closed"
              ]
            }
          }
        }
      }
    }
  }
}

answered Nov 03 '16 at 14:01

Val

207,596
13
358
360

Seems like a viable answer as well, I'll keep this in mind and see which works better in the long run! Thank you! – Rick van Lieshout Nov 07 '16 at 09:04
I have no stats to back this up, but this version should be more efficient than the one with pattern inclusion which has to run on all different `status.raw` terms. It will probably depend on how many different terms you have. Your mileage may vary. – Val Nov 07 '16 at 09:08
Mmmmm, I hadn't thought of the speed aspect but logic dictates that you're right. Do you perhaps know how to divide by terms as well? I have some unsolved es related questions still open so if you'd like to take a look at those too I'd be ever grateful. – Rick van Lieshout Nov 07 '16 at 09:21
What do you mean by "divide by terms"? – Val Nov 07 '16 at 09:22
Take a look if you please: http://stackoverflow.com/questions/40420880/elasticsearch-bucket-script-and-buckets-paths-return-could-not-find-aggregato/40432271#40432271 ps: thanks for fixing the title, I noticed it too haha – Rick van Lieshout Nov 07 '16 at 09:27

score 0 · Answer 3 · answered Nov 03 '16 at 14:05

0

You could add count with value_count sub aggregation and then use sum_bucket pipeline aggregation

{
  "aggs": {
    "unique_status": {
      "terms": {
        "field": "status.raw",
        "include": "Accepted|Released|Closed"
      },
      "aggs": {
        "count": {
          "value_count": {
            "field": "status.raw"
          }
        }
      }
    },
    "sum_status": {
      "sum_bucket": {
        "buckets_path": "unique_status>count"
      }
    }
  },
  "size": 0
}

answered Nov 03 '16 at 14:05

ChintanShah25

12,366
3
43
44

You can directly use `_count` ;-) – Andrei Stefan Nov 03 '16 at 14:05
Oh Yes, Thanks :) – ChintanShah25 Nov 03 '16 at 14:11

Elasticsearch summing buckets

3 Answers3