Background
I have an AWS managed Elascsearch v6.0 cluster that has 14 data instances.
It has time based indices like data-2010-01
, ...
, data-2020-01
.
Problem
Free storage space is very unbalanced across instances, which I can see in the AWS console:
I have noticed this distribution changes every time the AWS services runs through a blue-green deploy. This happens when cluster settings are changed or AWS releases an update.
Sometimes the blue-green results in one of the instances completely running out of space. When this happens the AWS service starts another blue-green and this resolves the issue without customer impact. (It does have impact on my heart rate though!)
Shard Size
Shards size for our indices are gigabytes in size but below the Elasticsearch recommendation of 50GB
.
The shard size does vary by index, though. Lots of our older indices have only a handful of documents.
Question
The way the AWS balancing algorithm does not balance well, and that it results in a different result each time is unexpected.
My question is how does the algorithm choose which shards to allocate to which instance and can I resolve this imbalance myself?