4

Consensus algorithm (e.g. raft) requires the cluster contains an odd number of nodes to avoid the split-brain problem.

Say I have a cluster of 5 nodes, what will happen if only one node fails? The cluster has 4 nodes now, which breaks the odd number rule, will the cluster continue to behave right?

One solution is to drop one more node to make the cluster contain only 3 nodes, but what if the previously failed node comes back? then the cluster has 4 nodes again, and we have to bring the afore-dropped node back in order to keep the cluster odd.

Do implementations of the consensus algorithm handle this problem automatically, or I have to do it in my application code (for example, drop a node)?

zx_wing
  • 1,918
  • 3
  • 26
  • 39

1 Answers1

8

Yes, the cluster will continue to work normally. A cluster of N nodes, and if N is odd (N = 2k + 1), can handle k node fails. As long as a majority of nodes is alive, it can work normally. If one node fails, and we still have the majority, everything is fine. Only when you lose majority of nodes, you have a problem.

There is no reason to force the cluster to have an odd number of nodes, and implementations don't consider this as a problem and thus don't handle it (drop nodes). You can run a consensus algorithm on an even number of nodes, but it usually makes more sense to have it odd.

3 node cluster can handle 1 node fail (the majority is 2 nodes).
4 node cluster can handle 1 node fail (the majority is 3 nodes).
5 node cluster can handle 2 node fail (the majority is 3 nodes).
6 node cluster can handle 2 node fail (the majority is 4 nodes).

I hope this makes it more clear why it makes more sense to have the cluster size to be an odd number, it can handle the same number of node failures with fewer nodes in the cluster.

msantl
  • 371
  • 2
  • 6
  • Thanks your answer. I can understand the cluster continue to run because it still has the majority. But what if a network partition happens then? as the cluster has 4 nodes(one node already failed), it might run into the situation that cluster splits into two two-nodes clusters which are unable to select a leader. – zx_wing May 16 '19 at 12:46
  • Yes, that is a case that can happen and there is not much you can do in terms of the consensus algorithm or your application code. The only way to get the consensus again is to resolve the network partition to obtain the majority. – msantl May 16 '19 at 12:49
  • Consensus will protect against split brain during network partion. It won't guarantee availability. Even with all 5 nodes up if you have double network partioning then a leader cannot be elected if no partion has 3 nodes. – Hari Krishna S Jul 15 '22 at 07:38