Problem
In a 3 node Actor cluster, While doing rolling restart of 3 Nodes Remembered entities are not recreating properly.
The Shards are completely rebalanced but Some of the entities not recreated.
Cluster Configurations
akka.cluster.sharding.remember-entities = on
akka.cluster.sharding.remember-entities-store = ddata
akka.cluster.sharding.distributed-data.durable.keys = []
akka.remote.artery{
enabled = on
transport = tcp
}
At start all the 3 nodes will have 100 shards in each node with 1000 Actors totally 300 Shards And 3000 Actors.
-
Node 1 -- 100 Shards \ 1000 Actors
-
Node 2 -- 100 Shards \ 1000 Actors
-
Node 1 -- 100 Shards \ 1000 Actors
1.When Node 1 Down Shards on node 1 rebalanced to Node 2 And node 3 with all the remembered entities recreated on those nodes.
-
Node 1 -- Down
-
Node 2 -- 150 Shards \ 1500 Actors
-
Node 1 -- 150 Shards \ 1500 Actors
2.When Node 1 is Up after few moments Node 2 getting Down .Shards and the Remembered entities on Node 2 is recreated to Node 1.
-
Node 1 -- 150 Shards \ 1500 Actors
-
Node 2 -- Down
-
Node 1 -- 150 Shards \ 1500 Actors
3.When Node 2 is Up after few moments Node 3 getting down.Shards and the Remembered entities on Node 2 is recreated to Node 2 but some of the entities not recreated to Node 2 from Node 3. All the Shards are rebalanced anyway.
-
Node 1 -- 150 Shards \ 1500 Actors
-
Node 2 -- 150 Shards \ 1423 Actor
-
Node 1 -- Down
The issue here is
When we restart the Node 3 after the Node 2 joined the Cluster the recreation of Remembered entities is inconsistent.
In mean time there are messages will be send to the Actors on the Cluster.
What can be the bottleneck here when the Node 3 Restarted right after the Node 2 joins?
Tried
1.If we are not restarting the Node 3 there is no issue with the Entities.
2.If we restart the Node 3 alone in rolling restart after some time there is no problem.
3.Increased\decreased Shard count.
4.Changed akka.cluster.distributed-data.majority-min-cap
from default 5 to 3 still issue persists.