I have a basic aggregation on an index with about 40 million documents.
{
aggs: {
countries: {
filter: {
bool: {
must: my_filters,
}
},
aggs: {
filteredCountries: {
terms: {
field: 'countryId',
min_doc_count: 1,
size: 15,
}
}
}
}
}
}
The index:
{
"settings": {
"number_of_shards": 5,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter",
"unique"
]
}
}
},
},
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
},
"countryId": {
"type": "short"
}
}
}
}
The search response time is 100ms, but the aggregation response time is about 1.5s, and is increasing as we add more documents (was about 200ms with 5 million documents). There are about 20 distinct countryId
right now.
What I tried so far:
- Allocating more RAM (from 4GB to 32GB), same results.
- Changing
countryId
field data type tokeyword
and addingeager_global_ordinals
option, it made things worse
The elasticsearch version is 7.8.0
, elastic has 8GB of ram, the server has 64GB of ram and 16CPU, 5 shards, 1 node
I use this aggregation to put filters in search results, so I need it to respond as fast as possible. For large number of results I don't need precision. so if it is approximate or even limited to a number (ex. 100 gte) it's great.
Any ideas how to speed up this aggregation ?