How to determine the correct size for terms aggregation, which will produce accurate aggregation results?

Question

As I read through document for Terms Aggregation, I came across the fact that the results from Term Aggregation are not always accurate, but we can increase the size to get the accurate results.

I know : -

How Query-Then-Fetch works.
How top terms are calculated at each shard(shard_size) and then merge at co-ordinator node(size).
What "doc_count_error_upper_bound" means, and how it can help in determining that there may be error in top results and we need to increase the size.

But is there any mathematical approach or any other way, with help of which we can determine the correct size that we should ask for once we get in-accurate results for the first time?

score 0 · Answer 1 · answered Jan 20 '22 at 11:50

You will get an accurate result as long as aggregation size (bucket count) is not lower than the field cardinality. If cardinality is very high you may try to use a very high shard_size or bump search.max_buckets in the ES settings (and size of the agg) though it will affect the performance.

How to determine the correct size for terms aggregation, which will produce accurate aggregation results?

1 Answers1