I have an issue with ES auto balancing shards in my cluster:
- I've seen it moving shards from nodes with more free disk space to nodes with less free disk space
- It has been moving shards non-stop for days, which I thought was ok until I realized it was moving them across AZs (AWS availability zones, which means: that costs money)
- I checked our nodes and they looked already quite balanced: similar shard counts and similar free disk space.
Why is ES taking the decision of moving shards then? Is there a way to look into why ES decides to balance a shard in a certain way?
Thanks.
Edit:
Some maybe-relevant configuration values we have set in our cluster:
Shard allocation settings
cluster.routing.allocation.enable => all
cluster.routing.allocation.node_concurrent_incoming_recoveries => 2
cluster.routing.allocation.node_concurrent_outgoing_recoveries => 2
cluster.routing.allocation.node_concurrent_recoveries => 8
cluster.routing.allocation.node_initial_primaries_recoveries => 4
cluster.routing.allocation.same_shard.host => false
Shard rebalancing settings
cluster.routing.allocation.allow_rebalance => indices_all_active
cluster.routing.allocation.cluster_concurrent_rebalance => 2
Shard balancing heuristics
cluster.routing.allocation.balance.shard => 0.45
cluster.routing.allocation.balance.index => 0.99f
cluster.routing.allocation.balance.threshold => 1.0
BTW:
- I'm using ElasticSearch 6.8
- My data nodes already have
cluster.routing.allocation.awareness.attributes'
set to'aws_availability_zone'
I had posted two similar questions in ES' blog: "Discuss the Elastic Stack" as I didn't get answers there:
There is a similar SO question here but it didn't get any answers. Also, the version they were using at the start of 2017 is likely different from the one I'm using now.