0

My documents include the fields name and date_year, and my goal is to find the most recently added names (e.g. the ten last added names with their first year of appearance and the total number of documents). I therefore have a terms aggregation on name, which is ordered by a min sub-aggregation on date_year:

{
  "aggs": {
    "group_by_name": {
      "terms": {
        "field": "name",
        "order": {
          "start_year": "desc"
        }
      },
      "aggs": {
        "start_year": {
          "min": {
            "field": "date_year"
          }
        }
      }
    }
  }
}

This is returning unexpected results, when not adding size under terms. For example, the first bucket has doc_count 1 and start_year 2015, while I'm sure that there are tens of documents with this name, and the earliest date_year is 1870. When I add a large enough size, the results are accurate. For example:

{
  "aggs": {
    "group_by_name": {
      "terms": {
        "field": "name",
        "size": 10000,        <------ large enough value
        "order": {
          "start_year": "desc"
        }
      },
      "aggs": {
        "start_year": {
          "min": {
            "field": "date_year"
          }
        }
      }
    }
  }
}

Can anyone explain to me what is causing this, and how I can limit the number of buckets returned? What I need would look something like this in SQL:

select name, min(year), count(*) from documents group by name order by min(year) desc limit 10

user3170702
  • 1,971
  • 8
  • 25
  • 33
  • Thanks, but I don't think there's a way to add size to the sub-aggregation as `min` is not a bucket aggregation? – user3170702 Dec 20 '18 at 16:39
  • yeah my bad. Didnot see the min. You still have to use big size in order to send name in the subagg. Question is if you only look for the last added name, you could maybe add a filter querry first? or start with an first agg last year (size 1) at the top to only keep the laster date_year documents? – LeBigCat Dec 20 '18 at 17:02

0 Answers0