0

We have a service built by 16 AeronClusters. Each AeronCluster contains 3 nodes, so we have 48 nodes in total. When we do the release, we will send command "kill -15" to leader node, and then it will execute function "CloseHelper.quietCloseAll(clusteredServiceContainer, consensusModule) " to exit current leader node status. But we find some problems at this stage: 1、Do you recommend using such a mechanism to exit the leader status? or is there any other recommended solution?

2、After leader nodes finish the instruction , those 16 AeronClusters will vote for new leaders (those new leaders will receive callback "onRoleChange"). But we observed quite big latency differences for election time cost among those AeronClusters(around 500ms-1400ms. Our configs are : election.time=100ms,leaderHeartbeatTimeoutNs=10s,leaderHeartbeatIntervalNs=200ms). The question is why will it happen? Why do different AeronClusters have different election time costs? Is there any solution to reduce the election cost as much as possible?

3、After leader nodes finish the instruments, it takes up to 10s(as long as leaderHeartbeatTimeoutNs) for some certain nodes to finish the election, why does it happen?

(Verson:1.40.0)

anyone encountered the similar situation?

Lei
  • 1

0 Answers0