I want to replace and index with zero-downtime, as described in the ES documentation.
I am doing so by:
- creating a new index
my_index_v2
with the new data - refreshing the new index
- then swapping them in an atomic operation, by performing the following request:
POST /_aliases
{
"actions": [
{ "remove": { "index": "*", "alias": "my_index" }},
{ "add": { "index": "my_index_v2", "alias": "my_index" }}
]
}
This works as expected, except when it randomly fails with 404 response. The error message is:
{
"error": {
"root_cause": ... (same)
"type": "index_not_found_exception",
"reason": "no such index",
"resource.type": "index_or_alias",
"resource.id": "my_unrelated_index_v13",
"index": "my_unrelated_index_v13"
},
"status": 404
}
- Afterwards, and only if it the swap worked, we delete the now unused indices that were associated with this and only this alias.
The whole operation happens periodically every few minutes. Similar operations to the one described might happen at the same time in the cluster, on other aliases/indices. The error happens randomly, every several hours.
Is there a reason why these operations would interfere with each other? What is going on?
EDIT: clarified the DELETE step at the end.