4

I have an issue with ES auto balancing shards in my cluster:

  • I've seen it moving shards from nodes with more free disk space to nodes with less free disk space
  • It has been moving shards non-stop for days, which I thought was ok until I realized it was moving them across AZs (AWS availability zones, which means: that costs money)
  • I checked our nodes and they looked already quite balanced: similar shard counts and similar free disk space.

Why is ES taking the decision of moving shards then? Is there a way to look into why ES decides to balance a shard in a certain way?

Thanks.

Edit:

Some maybe-relevant configuration values we have set in our cluster:

Shard allocation settings
   cluster.routing.allocation.enable => all
   cluster.routing.allocation.node_concurrent_incoming_recoveries => 2
   cluster.routing.allocation.node_concurrent_outgoing_recoveries => 2
   cluster.routing.allocation.node_concurrent_recoveries => 8
   cluster.routing.allocation.node_initial_primaries_recoveries => 4
   cluster.routing.allocation.same_shard.host => false

Shard rebalancing settings
   cluster.routing.allocation.allow_rebalance => indices_all_active
   cluster.routing.allocation.cluster_concurrent_rebalance => 2

Shard balancing heuristics
   cluster.routing.allocation.balance.shard => 0.45
   cluster.routing.allocation.balance.index => 0.99f
   cluster.routing.allocation.balance.threshold => 1.0

BTW:

  • I'm using ElasticSearch 6.8
  • My data nodes already have cluster.routing.allocation.awareness.attributes' set to 'aws_availability_zone'
  • I had posted two similar questions in ES' blog: "Discuss the Elastic Stack" as I didn't get answers there:

  • There is a similar SO question here but it didn't get any answers. Also, the version they were using at the start of 2017 is likely different from the one I'm using now.

Daniel
  • 21,933
  • 14
  • 72
  • 101

0 Answers0