Is there a way to improve performance of a nested term aggregation without sampling?
Terms query:
GET <INDEX>/_search?pretty&request_cache=false
{
"_source": false,
"sort": [
"_doc"
],
"size": 0,
"track_total_hits": false,
"aggregations": {
"nested_suggestions": {
"nested": {
"path": "measurement"
},
"aggs": {
"suggestions": {
"terms": {
"field": "measurement.description.label",
"size": 1
}
}
}
}
}
}
...
{
"took" : 8239,
"timed_out" : false,
...
"aggregations" : {
"nested_suggestions" : {
"doc_count" : 226139234,
"suggestions" : {
"doc_count_error_upper_bound" : 7445607,
"sum_other_doc_count" : 214543500,
"buckets" : [
{
"key" : "xxx",
"doc_count" : 11635382
}
]
}
}
}
}
Cardinality query:
GET <INDEX>/_search?pretty&request_cache=false
{
"_source": false,
"sort": [
"_doc"
],
"size": 0,
"track_total_hits": false,
"aggregations": {
"nested_suggestions": {
"nested": {
"path": "measurement"
},
"aggs": {
"suggestions": {
"cardinality": {
"field": "measurement.description.label"
}
}
}
}
}
}
...
{
"took" : 5688,
"timed_out" : false,
...
"aggregations" : {
"nested_suggestions" : {
"doc_count" : 226139234,
"suggestions" : {
"value" : 1379
}
}
}
}
Minimal mapping:
{
"settings": {
"number_of_replicas": "0",
"number_of_shards": "10",
"analysis": {
"normalizer": {
"raw_clean": {
"type": "custom",
"filter": [
"asciifolding"
]
}
}
}
},
"mappings": {
"_doc": {
"dynamic": "strict",
"properties": {
"id": {
"type": "keyword"
},
"measurement": {
"type": "nested",
"dynamic": "strict",
"properties": {
"id": {
"type": "keyword"
},
"description": {
"type": "text",
"norms": false,
"fields": {
"label": {
"type": "keyword",
"normalizer": "raw_clean",
"ignore_above": 255,
"eager_global_ordinals": true
}
}
}
}
}
}
}
}
}
I've verified that the global ordinals have data via /_cat/fielddata?v
.
Is this kind of performance expected with nested terms aggregations?
Environment:
- elasticsearch 6.8.3
- index size ~200GB (with the full mapping)
- documents ~1million
- nested documents ~225million
- 4CPU 16GB RAM 500GB SSD