1

I'm currently aggregating a collection by a multi-level nested field and calculating some sub-aggregation metrics from this collection and thats working using elasticsearch's reverse nested feature as described at Sub-aggregate a multi-level nested composite aggregation.

My current struggle is to find a way to sort the aggregations by one of the calculated metrics. For example, considering the following document and my current search call I would like to sort all the aggregations by their clicks sums.

I've tried using bucket_sort inside the inner aggs at the back_to_parent level but got the following java exception.

class org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested cannot be cast to class org.elasticsearch.search.aggregations.InternalMultiBucketAggregation 
(org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested and org.elasticsearch.search.aggregations.InternalMultiBucketAggregation are in unnamed module of loader 'app')
{
  id: '32ead132eq13w21',
  statistics: {
    clicks: 123,
    views: 456
  },
  categories: [{ //nested type
    name: 'color',
    tags: [{ //nested type
      slug: 'blue'
    },{
      slug: 'red'
    }]
  }]
}
GET /acounts-123321/_search
{
  size: 0,
  aggs: {
    categories_parent: {
        nested: {
          path: 'categories.tags'
        },
        aggs: {
          filtered: {
            filter: {
              term: { 'categories.tags.category': 'color' }
            },
            aggs: {
              by_slug: {
                terms: {
                  field: 'categories.tags.slug',
                  size: perPage
                },
                aggs: {
                  back_to_parent: {
                    reverse_nested: {},
                    aggs: {
                      clicks: {
                        sum: {
                          field: 'statistics.clicks'
                        }
                      },
                      custom_metric: {
                        scripted_metric: {
                          init_script: 'state.accounts = []',
                          map_script: 'state.accounts.add(new HashMap(params["_source"]))',
                          combine_script: 'double result = 0;
                            for (acc in state.accounts) {
                              result += ( acc.statistics.clicks + acc.statistics.impressions);
                            }
                            return result;',
                          reduce_script: 'double sum = 0;
                            for (state in states) {
                              sum += state;
                            }                            
                            return sum;'
                        }
                      },
                      by_tag_sort: {
                        bucket_sort: {
                          sort: [{ 'clicks.value': { order: 'desc' } }]
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }

Update:

It would also be nice to understand how it would be possible to sort the buckets by a custom metric calculated through a painless scripted_metric. I have updated the search call above adding a sample custom_metric that I wish to allow sorting through it.

I see that using bucket_sort directly does not work with the standard sort array we use for concrete fields. So the following does not seem to sort things. It also won't work for a sort script as well since [bucket_sort] only supports field based sorting.

by_tag_sort: {
  bucket_sort: {
    sort: [{ 'custom_metric.value': { order: 'desc' } }]
  }
}
adolfosrs
  • 9,286
  • 5
  • 39
  • 67

2 Answers2

2

bucket_sort expects to be run within a multi-bucket context but your reverse_nested aggregation is single-bucket (irrespective of the fact that it's a child of a multi-bucket terms aggregation).

The trick is to use an empty-ish filters aggregation to generate a multi-bucket context and then run the bucket sort:

{
  "size": 0,
  "aggs": {
    "categories_parent": {
      "nested": {
        "path": "categories.tags"
      },
      "aggs": {
        "filtered": {
          "filter": {
            "term": {
              "categories.tags.category": "color"
            }
          },
          "aggs": {
            "by_slug": {
              "terms": {
                "field": "categories.tags.slug",
                "size": 10
              },
              "aggs": {
                "back_to_parent": {
                  "reverse_nested": {},
                  "aggs": {
                    "multi_bucket_emulator": {
                      "filters": {
                        "filters": {
                          "placeholder_match_all_query": {
                            "match_all": {}
                          }
                        }
                      },
                      "aggs": {
                        "clicks": {
                          "sum": {
                            "field": "statistics.clicks"
                          }
                        },
                        "by_tag_sort": {
                          "bucket_sort": {
                            "sort": [
                              {
                                "clicks.value": {
                                  "order": "desc"
                                }
                              }
                            ]
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Update: sorting by the result of a custom scripted metric value

{
  "size": 0,
  "aggs": {
    "categories_parent": {
      "nested": {
        "path": "categories.tags"
      },
      "aggs": {
        "filtered": {
          "filter": {
            "term": {
              "categories.tags.category": "color"
            }
          },
          "aggs": {
            "by_slug": {
              "terms": {
                "field": "categories.tags.slug",
                "size": 10
              },
              "aggs": {
                "back_to_parent": {
                  "reverse_nested": {},
                  "aggs": {
                    "multi_bucket_emulator": {
                      "filters": {
                        "filters": {
                          "placeholder_match_all_query": {
                            "match_all": {}
                          }
                        }
                      },
                      "aggs": {
                        "clicks": {
                          "sum": {
                            "field": "statistics.clicks"
                          }
                        },
                        "custom_metric": {
                          "scripted_metric": {
                            "init_script": "state.accounts = []",
                            "map_script": """state.accounts.add(params["_source"])""",
                            "combine_script": """
                                double result = 0;
                                for (def acc : state.accounts) {
                                  result += ( acc.statistics.clicks + acc.statistics.impressions);
                                }
                                return result;
                            """,
                            "reduce_script": """
                              double sum = 0;
                              for (def state : states) {
                                sum += state;
                              }                            
                              return sum;
                            """
                          }
                        },
                        "by_tag_sort": {
                          "bucket_sort": {
                            "sort": [
                              {
                                "custom_metric.value": {
                                  "order": "desc"
                                }
                              }
                            ]
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • Hey @joe-sorocin, I see that this sorting method does not work in case the aggregated metric is calculated using a script. I have also tried using a sort script but bucket_sort does not allow it too. `[bucket_sort] only supports field based sorting`. Is there anything I can do as a workaround? Thanks! – adolfosrs Feb 22 '21 at 03:36
  • Can you update your question accordingly? `statistics.clicks` is a concrete field. – Joe - GMapsBook.com Feb 22 '21 at 08:25
  • Have just updated it with an example of custom metric. Please let me know if you think it should be different or it is still missing some info. – adolfosrs Feb 22 '21 at 11:53
  • Please re-read my original answer first -- `bucket_sort` needs a multi-bucket context. But anyways, sorting by a custom metric is indeed possible -- see my answer update. I corrected your script loops because there are no `for ... in` loops in java... Just make sure you replace the triple quotes `"""` with backticks `\` ... \`` when putting it in your JS code. – Joe - GMapsBook.com Feb 22 '21 at 13:34
1

Joe - Elasticsearch Handbook - I have an equivalent query to yours (one that sorts by the result of a custom scripted metric) and I expect the response to your query looks something like the below.

I have noticed that sorting specified by the bucket_sort does not get applied to the uppermost buckets (i.e. by_slug.buckets), which are still sorted by the default doc_count ordering. This can also be verified by changing the custom_metric.value ordering from desc to asc, which has no effect on the order of the results.

My understanding of bucket_sort suggests that sorting based on the custom_metric is applied to the aggregation one level up, which in this case would be multi_bucket_emulator.buckets (but because this is an emulator it has no actual buckets to sort).

Is it possible to sort the by_slug.buckets based on the custom_metric values?

I am using Elasticsearch v7.10.

Thanks very much.

(Sorry for posting this question as an answer; it was too long to be a comment.)

Response (approximation):

{
    "aggregations": {
        "categories_parent": {
            "filtered": {
                "by_slug": {
                    "buckets": [
                        {
                            "key": "xxxxxx",
                            "back_to_parent": {
                                "multi_bucket_emulator": {
                                    "buckets": {
                                        "placeholder_match_all_query": {
                                            "clicks": {
                                                "buckets": [
                                                    {
                                                        "key": 5.0,
                                                        "doc_count": 1
                                                    },
                                                    …
                                                ]
                                            },
                                            "custom_metric": {
                                                "value": 20.0
                                            }
                                        }
                                    }
                                }
                            }
                        },
                        …
                    ]
                }
            }
        }
    }
}
Andy Gout
  • 302
  • 2
  • 19