What to do after one node in zookeeper cluster fails?

Question

According to https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup

Cross Machine Requirements For the ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority. For this reason, ZooKeeper deployments are usually made up of an odd number of machines.

To achieve the highest probability of tolerating a failure you should try to make machine failures independent. For example, if most of the machines share the same switch, failure of that switch could cause a correlated failure and bring down the service. The same holds true of shared power circuits, cooling systems, etc.

My question is: What should we do after we identified a node failure within Zookeeper cluster to make the cluster 2F+1 again? Do we need to restart all the zookeeper nodes? Also the clients connects to Zookeeper cluster, suppose we used DNS name and the recovered node using same DNS name.

For example: 10.51.22.89 zookeeper1 10.51.22.126 zookeeper2 10.51.23.216 zookeeper3

if 10.51.22.89 dies and we bring up 10.51.22.90 as zookeeper1, and all the nodes can identify this change.

score 1 · Answer 1 · answered Sep 07 '17 at 18:27

1

If you connect 10.51.22.90 as zookeeper1 (with the same myid file and configuration as 10.51.22.89 had before) and the data dir is empty, the process will connect to current leader (zookeeper2 or zookeeper3) and copy snapshot of the data. After successful initialization the node will inform rest of the cluster nodes and you have 2F+1 again.

Try this yourself, having tail -f on log files. It won't hurt the cluster and you will learn a lot on zookeeper internals ;-)

answered Sep 07 '17 at 18:27

Mariusz

13,481
3
60
64

Thanks @Mariusz In our practice, we found the issue like https://stackoverflow.com/questions/22155494/why-cant-my-zookeeper-server-rejoin-the-quorum so we have to "rolling restart" all the zookeeper nodes. – Chong Wang Sep 07 '17 at 19:01
What version do you use? The question you referenced is more than 3 years old and applies to zookeeper 3.4.5. – Mariusz Sep 07 '17 at 22:24
Hi @Mariusz, do you recommend to use a supervisory process for the zookeepers, that will attempt to restart the zookeeper if it died, (assuming no node failure). – jumping_monkey Dec 19 '19 at 02:34
@jumping_monkey Sure, I recommend adding supervisor process to watch on all deamons you use. – Mariusz Dec 19 '19 at 10:03

What to do after one node in zookeeper cluster fails?

1 Answers1