4

I am trying to paginate over a specific field using the terms aggregation with partitions. The problem is that the number of returned terms for each partition is not equal to the size parameter that I set.

These are the steps that I am doing:

  1. Retrieve the number of different unique values for the field with "cardinality" aggregation. In my data, the result is 21.

  2. From the web page, the user wants to display a table with 10 items per page.

    if unique_values % page_size != 0:
            partitions_number = (unique_values // page_size) + 1
        else:
            partitions_number = (unique_values // page_size) 
    

Than I am making this simple query:

POST my_index/_search?pretty
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "field_to_paginate": "foo"
          }
        }
      ]
    }
  },
  "aggs": {
    "by_pchostname": {
      "terms": {
        "size": 10,
        "field": "field_to_paginate",
        "include": {
          "partition": 0,
          "num_partitions": 3
        }
      }
    }
  }
}

I am expecting to retrieve 10 results. But if I run the query I have only 7 results. What am I missing here? Do I need to use a different solution here?

As a side note, I can't use composite aggregation because I need to sort results by doc_count over the whole dataset.

betto86
  • 694
  • 1
  • 8
  • 23

3 Answers3

2

Partitons in terms aggregation divide the values in equal chunks.

In your case no of partition num_partitions is 3 so 21/3 == 7.

Partitons are meant for getting large values in the order of 1000 s.

alexgids
  • 396
  • 3
  • 11
  • They are not divided in equal chunks. I had 3.5M items and with 100,000 partitions, size of buckets varied from 25 to 41. The terms aggregation is meant to return the top terms and does not allow pagination. – kundan Jan 19 '21 at 20:15
1

You may be able to leverage shard_size parameter. My suggestion is to read this part of manual and work with the shard_size param

Nirmal
  • 1,276
  • 8
  • 16
0

Terms aggregation does not allow pagination. Use composite aggregation instead (requires ES >= 6.1.0). Below is the quote from reference docs:

If you want to retrieve all terms or all combinations of terms in a nested terms aggregation you should use the Composite aggregation which allows to paginate over all possible terms rather than setting a size greater than the cardinality of the field in the terms aggregation. The terms aggregation is meant to return the top terms and does not allow pagination.

kundan
  • 1,278
  • 14
  • 27
  • 1
    Hi @kundan, but why does it not honour the "size" value? Can you check this question? https://stackoverflow.com/questions/67214253/elasticserch-terms-aggregation-with-partition-does-not-honor-the-size-value It always does without partitions – User3518958 Apr 23 '21 at 06:18