On Elasticsearch 2.0.0, I'm trying to sort buckets on a multi-bucket terms
aggregation by the top hit count of a given term.
Here's an example to make things clearer. Say we have the following dataset representing customer purchases:
[{
"id": "95aee6b0-9c41-11e5-8994-feff819cdc9f",
"name": "Ingrid Bergman",
"channel": "WEB",
"productId": 3124,
"totalPreTax": 221.5
},
{
"id": "95aee6b0-9c41-11e5-8994-feff819cdc9f",
"name": "Ingrid Bergman",
"channel": "MOB",
"productId": 5527,
"totalPreTax": 12.5
},
{
"id": "95aee6b0-9c41-11e5-8994-feff819cdc9f",
"name": "Ingrid Bergman",
"channel": "WEB",
"productId": 1188,
"totalPreTax": 55.6
},
{
"id": "2854b9d6-9c42-11e5-8994-feff819cdc9f",
"name": "Luis Borges",
"channel": "IPAD",
"productId": 779,
"totalPreTax": 119.0
}]
What I need is to fetch, for each customer, the average totalPreTax
and their most frequent channel
, sort alphabetically by the latter (desc). That is,
[{
"id": "95aee6b0-9c41-11e5-8994-feff819cdc9f",
"name": "Ingrid Bergman",
"channel": "WEB",
"totalPreTax": 96.53
},
{
"id": "2854b9d6-9c42-11e5-8994-feff819cdc9f",
"name": "Luis Borges",
"channel": "IPAD",
"totalPreTax": 119.0
}]
So far, I have the following (showing only relevant bits), which sorts by the average totalPreTax
:
{ //...
"aggs": {
"byCustomer": {
"terms": {
"field": "customer.id",
"order": {
"averageTotalPreTax": "desc"
}
},
"aggs": {
"averageTotalPreTax": {
"avg": {
"field": "totalPreTax"
}
},
"channel": {
"terms": {
"field": "channel",
"order": {
"_term": "desc"
},
"size": 1
}
}
}
Problem is, figuring out the top channel
for each bucket (aka, each customer) requires a second multi-bucket term
sub-aggregation. I'm only interested in the first element of that resulting array, which holds the needed value for the most frequent channel. This, prevents me from sorting my byCustomer
aggregation by using something like:
"byCustomer": {
"terms": {
"field": "customer.id",
"order": {
"channel": "desc"
}
}
}
Resulting in:
AggregationExecutionException[Invalid terms aggregation order path [channel]. Terms buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end
Someone out there, tackled something similar (but not quite my scenario), but didn't properly answered the question.
So, how would I go about doing this? Any ideas?
EDIT: Question not tied to ES version. Possible answers can apply to any particular API version.