I have millions of records in ElasticSearch. Today, I realized there are some records duplicated. Is there any way to remove these duplicated records?
This is my query.
{
"query": {
"filtered":{
"query" : {
"bool": {"must":[
{"match": { "sensorId": "14FA084408" }},
{"match": { "variableName": "FORWARD_FLOW" }}
]
}
},
"filter": {
"range": { "timestamp": { "gt" : "2015-07-04",
"lt" : "2015-07-06" }}
}
}
}
}
And this is what I recieve from it.
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 21,
"max_score": 8.272615,
"hits": [
{
"_index": "iotsens-summarizedmeasures",
"_type": "summarizedmeasure",
"_id": "AU5isxVcMpd7AZtvmZcK",
"_score": 8.272615,
"_source": {
"id": null,
"sensorId": "14FA084408",
"variableName": "FORWARD_FLOW",
"rawValue": "0.2",
"value": "0.2",
"timestamp": 1436047200000,
"summaryTimeUnit": "DAYS"
}
},
{
"_index": "iotsens-summarizedmeasures",
"_type": "summarizedmeasure",
"_id": "AU5isxVnMpd7AZtvmZcL",
"_score": 8.272615,
"_source": {
"id": null,
"sensorId": "14FA084408",
"variableName": "FORWARD_FLOW",
"rawValue": "0.2",
"value": "0.2",
"timestamp": 1436047200000,
"summaryTimeUnit": "DAYS"
}
},
{
"_index": "iotsens-summarizedmeasures",
"_type": "summarizedmeasure",
"_id": "AU5isxV6Mpd7AZtvmZcN",
"_score": 8.0957,
"_source": {
"id": null,
"sensorId": "14FA084408",
"variableName": "FORWARD_FLOW",
"rawValue": "0.2",
"value": "0.2",
"timestamp": 1436047200000,
"summaryTimeUnit": "DAYS"
}
},
{
"_index": "iotsens-summarizedmeasures",
"_type": "summarizedmeasure",
"_id": "AU5isxWOMpd7AZtvmZcP",
"_score": 8.0957,
"_source": {
"id": null,
"sensorId": "14FA084408",
"variableName": "FORWARD_FLOW",
"rawValue": "0.2",
"value": "0.2",
"timestamp": 1436047200000,
"summaryTimeUnit": "DAYS"
}
},
{
"_index": "iotsens-summarizedmeasures",
"_type": "summarizedmeasure",
"_id": "AU5isxW8Mpd7AZtvmZcT",
"_score": 8.0957,
"_source": {
"id": null,
"sensorId": "14FA084408",
"variableName": "FORWARD_FLOW",
"rawValue": "0.2",
"value": "0.2",
"timestamp": 1436047200000,
"summaryTimeUnit": "DAYS"
}
},
{
"_index": "iotsens-summarizedmeasures",
"_type": "summarizedmeasure",
"_id": "AU5isxXFMpd7AZtvmZcU",
"_score": 8.0957,
"_source": {
"id": null,
"sensorId": "14FA084408",
"variableName": "FORWARD_FLOW",
"rawValue": "0.2",
"value": "0.2",
"timestamp": 1436047200000,
"summaryTimeUnit": "DAYS"
}
},
{
"_index": "iotsens-summarizedmeasures",
"_type": "summarizedmeasure",
"_id": "AU5isxXbMpd7AZtvmZcW",
"_score": 8.0957,
"_source": {
"id": null,
"sensorId": "14FA084408",
"variableName": "FORWARD_FLOW",
"rawValue": "0.2",
"value": "0.2",
"timestamp": 1436047200000,
"summaryTimeUnit": "DAYS"
}
},
{
"_index": "iotsens-summarizedmeasures",
"_type": "summarizedmeasure",
"_id": "AU5isxUtMpd7AZtvmZcG",
"_score": 8.077545,
"_source": {
"id": null,
"sensorId": "14FA084408",
"variableName": "FORWARD_FLOW",
"rawValue": "0.2",
"value": "0.2",
"timestamp": 1436047200000,
"summaryTimeUnit": "DAYS"
}
},
{
"_index": "iotsens-summarizedmeasures",
"_type": "summarizedmeasure",
"_id": "AU5isxXPMpd7AZtvmZcV",
"_score": 8.077545,
"_source": {
"id": null,
"sensorId": "14FA084408",
"variableName": "FORWARD_FLOW",
"rawValue": "0.2",
"value": "0.2",
"timestamp": 1436047200000,
"summaryTimeUnit": "DAYS"
}
},
{
"_index": "iotsens-summarizedmeasures",
"_type": "summarizedmeasure",
"_id": "AU5isxUZMpd7AZtvmZcE",
"_score": 7.9553676,
"_source": {
"id": null,
"sensorId": "14FA084408",
"variableName": "FORWARD_FLOW",
"rawValue": "0.2",
"value": "0.2",
"timestamp": 1436047200000,
"summaryTimeUnit": "DAYS"
}
}
]
}
}
As you can see, I have 21 duplicated records for the same day. How can I delete the duplicated records an preserve only one per day? Thanks.