I have around 40 million records in my elasticsearch index. I want to calculate count of distinct values for combination of 2 fields.
Example for given set of documents:
[
{
"JobId" : 2,
"DesigId" : 12
},
{
"JobId" : 2,
"DesigId" : 4
},
{
"JobId" : 3,
"DesigId" : 5
},
{
"JobId" : 2,
"DesigId" : 4
},
{
"JobId" : 3,
"DesigId" : 5
}
]
For above example, I should get the count = 3 as only 3 distinct values exists : [(2,12),(2,4),(3,5)]
I tried using cardinality aggregation for this but that provides an approximate count. I want to calculate the exact count accurately.
Below is the query which I used using cardinality aggregation:
"aggs": {
"counts": {
"cardinality": {
"script": "doc['JobId'].value + ',' + doc['DesigId'].value",
"precision_threshold": 40000
}
}
}
I also tried using composite aggregation on combination of 2 fields using after key and counting the overall size of buckets but that process is really time taking and my query is getting timed out.
Is there any optimal way to achieve it?