4

I've noticed a few peculiarities about MongoDB replica sets.

In a 3-node replSet, if the PRIMARY goes down, I see that the set elects a new PRIMARY and everything is fine without experiencing any downtime. But if another member goes down (2 total down), the 1 remaining member does not become PRIMARY and a complete outage happens. I understand this is because the replSet does not have a majority for an election.

But this seems silly.. shouldn't my 1 surviving member be able to function on its own? Is there a way to configure it so that I get this behavior?

I understand that arbiters can be used to achieve majority, however, if I add an arbiter for a total of 4 members, an even number, then wouldn't this also run into problems with majorities? Or, if I add 2 arbiters for a total of 5 voting members but 1 goes down, wouldn't I be left with an even number of voting members and still suspect to the replSet not being able to elect a PRIMARY?

In general I'm a little confused about how "majority" is established with regards to what happens when members go up or down, and what configuration options I have. My specific questions are:

  • How do I protect against an outage in a 3-node replSet when 2 members go down, and/or what is the best practice for safely remediating an outage that happens in this scenario?
  • In an odd-member replSet, what happens when an odd number of members go down and leave the replSet with an even number of members online (with respect to the replSet being able to do a majority election)?
CraexIt
  • 179
  • 3
  • 11
  • 1
    There is but I don't recommend it. The reason why it won't allow a single primary left to dish out the work is because it cannot be certain that the primary in question is actually legitimately the primary or whether just a cut off node etc etc – Sammaye Dec 26 '14 at 19:24

2 Answers2

4

How do I protect against an outage in a 3-node replSet when 2 members go down

You don't. If two members go down, your replica set becomes read-only, and rightly so. "down" can be relative - server 1 may say 2 and 3 are down, but really 1 is on the other side of a network partition. If server 1 guarded against an outage of 2 members, it would become primary and accept writes. However, one of 2 or 3 is also a primary, so now the set has two primaries. How do you reconcile conflicting writes sent to 1 and sent to the primary of 2, 3 when the partition ends? Probability is your shield against having a majority of replica set members down- if a server is down 1% of the time, and each server going down is independent of another going down (an assumption that's likely true except insofar as the servers are colocated), then at least 2 will be down only 1/10000 of the time. If you need better odds, use 5 servers in the replica set.

what happens when an odd number of members go down and leave the replSet with an even number of members online

The replica set needs a majority (in terms of total number of replica set members, not the number currently up from any one member's point of view) in order to elect a primary. If some group of replica set members, no matter whether it's an even or odd number, see that they form a majority of the replica set, they will attempt to elect a primary. The majority condition guarantees there can be only one primary. So 8/11 members talking to each other will elect a primary as ably as 7/11 or 9/11.

wdberkeley
  • 11,531
  • 1
  • 28
  • 23
  • "If you need better odds, use 5 servers in the replica set." then what if they are separated into 2 machine : Machine A runs 3 node and Machine B runs 2 node. One day they lost their connection, doesn't it means that both machine now have their own primary ? Thx – DennyHiu Apr 02 '15 at 01:38
  • 3
    "So 8/11 members talking to each other will elect a primary as ably as 7/11 or 9/11." - but in case of 8/11 there is a chance of tied elections, isn't it? While in case of 7/11 and 9/11 the chance is none? – Ikar Pohorský Jul 15 '16 at 06:32
0

What is the best practice for safely remediating an outage that happens in this scenario?

As the previous answer mentioned, MongoDB is trying very hard to avoid having two primaries in a replica set, because that would cause serious corruption. If you know that a node is down and not coming back, you could remove it from the replica set. Even if you have only one surviving node, you can tell that node to remove the down nodes from the configuration, so you end up with a one-node replica set, and that one node will become primary. If you have no primary, you would have to use the "force" option in rs.reconfig() to remove the down nodes. After that, you can add new nodes to the replica set, and they will start copying data from the surviving node. You might have to adjust your app configuration to refer to the new nodes.