My documents include the fields name
and date_year
, and my goal is to find the most recently added names (e.g. the ten last added names with their first year of appearance and the total number of documents). I therefore have a terms aggregation on name
, which is ordered by a min sub-aggregation on date_year
:
{
"aggs": {
"group_by_name": {
"terms": {
"field": "name",
"order": {
"start_year": "desc"
}
},
"aggs": {
"start_year": {
"min": {
"field": "date_year"
}
}
}
}
}
}
This is returning unexpected results, when not adding size
under terms
. For example, the first bucket has doc_count
1 and start_year
2015, while I'm sure that there are tens of documents with this name, and the earliest date_year
is 1870. When I add a large enough size
, the results are accurate. For example:
{
"aggs": {
"group_by_name": {
"terms": {
"field": "name",
"size": 10000, <------ large enough value
"order": {
"start_year": "desc"
}
},
"aggs": {
"start_year": {
"min": {
"field": "date_year"
}
}
}
}
}
}
Can anyone explain to me what is causing this, and how I can limit the number of buckets returned? What I need would look something like this in SQL:
select name, min(year), count(*) from documents group by name order by min(year) desc limit 10