0

I have two program, fc(failoverController) and web(webServer). And I use zookeeper to ensure high reliability.

fc will deploy on two server, two fc use apache-curator LeaderSelector to elect master, and the master will start a web process, and web process will provide services. In order not to give up leadership, I use a while(true) at the end of the function takeLeadership().

But in a certain situation, our custom deploy zookeeper on three vmware esxi virtual machine. and they are snapshot the three vm (snapshot vm memory) everyday.

one day, there has been a strange phenomenon, fc1 become master, A few milliseconds, fc2 become master, The time difference between before and after is very short. This triggered a bug in our program, we have two master.

In order to fix this problem, we use an AtomBoolean var, declare if zk status become LOST or SUSPEND, and use this var mark whether to exit takeLeadership.

now I want to test this two master case, how can I build a scene where zookeeper jitter cause to multiple rapid elections.

I has tested the following operations, but can't reproduce:

  1. frequent restart of zk services.
  2. use tcpkill to kill one of fc to zk port.
hehe
  • 173
  • 3
  • 13
  • Try out "Circuit Breaking ConnectionStateListener" and see if that helps: http://curator.apache.org/utilities.html – Randgalt Jun 22 '21 at 04:36
  • thanks, Randgalt, I read this page, and found the"Circuit Breaking ConnectionStateListener" function is controll state change send or not send to the Listener. Can you describe it in detail how to build a scene to controll zk quick elections, to test my solution. – hehe Jun 22 '21 at 08:39
  • No, I'm sorry I don't have the time do that. But, look at this code snippet here and give it a try: https://github.com/apache/curator/blob/master/curator-framework/src/main/java/org/apache/curator/framework/state/CircuitBreakingConnectionStateListener.java#L66 – Randgalt Jun 22 '21 at 10:17

0 Answers0