We are experiencing lot of failures with ElasticSearch queries for few days. When I monitor the cluster health CPU/JVM Memory utilization is high (almost 98%).
While debugging the issue, I found that last automated snapshot is i IN_PROGRESS
state for more than 20days, I'm suspecting this is the root cause.
But I'm not sure what is causing for long snapshot, and couldn't able to stop/delete that snapshot. When I tried http DELETE request on the repository using postman with aws signature, I got 401 Unauthorized
error with message Your request is not allowed
.
Can anyone help me understand the long running snapshot issue and how to resolve it.
Thanks in advance.