0

I saw a similar question asked here that is closely related, but the provided answer doesn't entirely address my doubts.

Assume a 3 node cluster is configured without a witness node, and all nodes are able to talk to each other.

Text

As the number of nodes is odd, it should be able to achieve a quorum by itself, assuming at least 2 nodes are up and running and able to communicate with each other. This prevents a split-brain situation where multiple nodes could run a service that should only have one active instance at a time.

What if one of the connections breaks and nodes A and B cannot talk to each other, like this:

Text

Then both (A,C) and (B,C) pairs could achieve two separate quorums (green and red) and potentially introduce a split-brain situation - if B was running UniqueService, node A would not know that B is still running and could decide to start it. My question is:

a) is this a real problem, if not, why?

b) if not, is it because both AC and BC pairs can communicate, so any communication between A and B would still work through A <-> C <-> B. If A and B can't communicate, it's due to a bigger issue that quorum is not even trying to solve?

user5539357
  • 103
  • 3

1 Answers1

3

You've pretty much answered your own question here, but just to confirm, yes - that shouldn't ever be a situation which could arise if you're following the best practices on implementation of such a cluster.

If A<->C can communicate, and B<->C can communicate, there shouldn't really be any possibility that A<->B can't communicate.

Also, since C is present in both cases, it would not cast a vote to create quorum twice - that's the whole point of quorum. Each node only has one vote.

Depending on the cluster configuration, I would say that even if you were to somehow invent a scenario in which A and B can only communicate with C and not each other, then a cluster would either a) not be formed at all or b) be formed once, as node C would basically be the deciding vote as to which nodes will be involved in the cluster (it and one other).

BE77Y
  • 2,667
  • 3
  • 18
  • 23
  • Thanks! 'it would not cast a vote to create quorum twice - that's the whole point of quorum. Each node only has one vote' - maybe I don't understand how voting works, but the way I think of it is that a node 'broadcasts' its vote to all nodes it can reach. In this case C can reach A and B, so A's and B's counters would both get +1'd by C's vote, wouldn't they? – user5539357 Apr 04 '23 at 13:07
  • That would be a very bad implementation that totally ignores an obvious edge case that you use as demonstration. As such, i would consider this... a bug. – TomTom Apr 04 '23 at 14:14
  • @TomTom then how does voting process really work then? What does it mean for a node to cast a vote? Doesn't it get broadcasted to all other nodes? – user5539357 Apr 04 '23 at 14:20
  • @user5539357 no, there is no "+1". Each node can vote to create a quorum, they don't +1 each other. In your scenario, since C can talk to A and B, it basically decides which one it wants to create a cluster with (or arbitrarily picks, or is configured with priorities, etc). It might be helpful for you to read this post by an MS employee on understanding quorum: https://techcommunity.microsoft.com/t5/failover-clustering/understanding-quorum-in-a-failover-cluster/ba-p/371678 – BE77Y Apr 05 '23 at 07:18
  • The link you posted doesn't actually explain that and does not make any statements that would indicate it works the way you described, but I guess I'll have to trust you on that. I thought that each node broadcasts their vote, and also each node keeps track of how many votes it received, that's what I meant by +1-ing each other. So if we have 5 nodes and 2 of them are completely cut off from the remaining nodes, they would never receive more than 2 vote "packets" and would not achieve a quorum. That's my understanding of voting, the docs should be more precise on that I believe. – user5539357 Apr 05 '23 at 09:02