I have loaded data into my elasticsearch
cluster from hive using the elasticsearch-hadoop
plugin from elastic
.
I need to fetch a count of unique account numbers. I have the following queries written in both hql
and queryDSL
, BUT they are returning different counts.
Hive Query:
select count(distinct account) from <tableName> where capacity="550";
// Returns --> 71132
Similarly, in Elasticsearch the query looks like this:
{
"query": {
"bool": {
"must": [
{"match": { "capacity": "550"}}
]
}
},
"aggs": {
"unique_account": {
"cardinality": {
"field": "account"
}
}
}
}
// Returns --> 71607
Am I doing something wrong? What can I do to match the two queries?
Note:
There are exactly the same number of records in hive and elasticsearch.