According to https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup
Cross Machine Requirements For the ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority. For this reason, ZooKeeper deployments are usually made up of an odd number of machines.
To achieve the highest probability of tolerating a failure you should try to make machine failures independent. For example, if most of the machines share the same switch, failure of that switch could cause a correlated failure and bring down the service. The same holds true of shared power circuits, cooling systems, etc.
My question is: What should we do after we identified a node failure within Zookeeper cluster to make the cluster 2F+1 again? Do we need to restart all the zookeeper nodes? Also the clients connects to Zookeeper cluster, suppose we used DNS name and the recovered node using same DNS name.
For example: 10.51.22.89 zookeeper1 10.51.22.126 zookeeper2 10.51.23.216 zookeeper3
if 10.51.22.89 dies and we bring up 10.51.22.90 as zookeeper1, and all the nodes can identify this change.