I have a field that store array of strings. different documents hold different set of strings.
ex: "ftypes": ["PDF", "TXT", "XML"]
now I used this aggregation query to analyze each file type usage.
{
"aggs": {
"list": {
"terms": {
"field": "ftypes",
"min_doc_count": 0,
"size": 100000
}
}
}
}
result ==>
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 137265,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"list": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "PDF",
"doc_count": 134475
},
{
"key": "TXT",
"doc_count": 21312
},
{
"key": "XML",
"doc_count": 6597
},
{
"key": "JPG",
"doc_count": 1233
}
]
}
}
}
and the results were correct as expected. but recently I've updated this field after removing XML file support. so non of the doc has file type XML. i can confirm that from this query.
{
"query": {
"terms": {
"ftypes": ["XML"]
}
}
}
result ===>
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
total hits count is zero. strange thing is when I do the above aggregation query again yet I can see XML as a term. doc count is zero.
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 137265,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"list": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "PDF",
"doc_count": 134475
},
{
"key": "TXT",
"doc_count": 21312
},
{
"key": "JPG",
"doc_count": 1233
},
{
"key": "XML",
"doc_count": 0
}
]
}
}
}
where is this XML term is now coming from if it does not exists on any document?. is there are any cache that i need to remove?