I'm fairly new to Elasticsearch and trying to periodically delete documents using the _delete_by_query API (I fully appreciate I should probably be using time based indices to make this easier, and will be updating the indexing structure in due course, but for now I need to get this working).
My index contains fields called ServiceName, message and timestamp (among others) and my requirement is pretty simple. I want to delete documents where ServiceName equals a specific value (myService), the message does NOT equal either of two specific values (Starting* and Finished* as I want to retain the first and last log message from any trace history), and the document is old than one day. I am using the _delete_by_query endpoint with the following payload:
{
"query": {
"bool": {
"must": [],
"filter": [{
"match_all": {}
},
{
"match_phrase": {
"ServiceName": {
"query": "myService"
}
}
},
{
"range": {
"@timestamp": {
"lte": "now-1d"
}
}
}
],
"should": [],
"must_not": [{
"bool": {
"should": [{
"match_phrase": {
"message": "Starting*"
}
},
{
"match_phrase": {
"message": "Finished*"
}
}
],
"minimum_should_match": 1
}
}]
}
}
}
When I run the query using the _search API, it returns the data I'd expect to be deleted, but when I issued the same query to _delete_by_query, it deleted documents that were not returned in the search results. I am using AWS Elasticsearch Service. Can anybody tell me where I'm going wrong or should this work? I thought initially it might be the minimum_should_match
property however the documentation seems to suggest this is irrelevant in this case