3

We have 4 servers located in two data centers. A direct line connects the two datacenters. We are planning to switch from MariaDB to MariaBD Gallera as circular replication is a pain.

A load-balancer is used to determine which server will handle an incoming request which can be any of the four servers.

My concern is what happens when the connection between the data center is down? Will each two centers form a mini cluster until the connection is back and then they reorganize themselves into a 4 node cluster again?

How does Galera prevent primary key duplication issues that may occure when the connection between the two data centers fails?

Imagine that the load-balancer starts forwarding requests to both data centers and inserts take place in both of them which may result in PK duplication issues when the connection come back.

enter image description here

I have tried to simulate the different cases using vmware station however I have no idea how to replicate this case. I have managed to create a 4 node replication cluster and managed to take out and bring nodes into the cluster. However I have no idea how simulate data centers.

thedethfox
  • 1,651
  • 2
  • 20
  • 38

1 Answers1

6

The easiest way to simulate your situation is killing 2 nodes in ONE datacenter simultaneously. Another possibility is to start firewall rules simultaneously...

Galera Cluster is a pessimistic cluster. This means that highest priority has data integrity and not availability (in contrary to M/M replication which is an optimistic cluster: priority has availability and not data integrity).

In your case the Galera nodes detect that there are some other nodes missing. As a next step each side of your Galera Cluster tries to find a quorum (majority, more than half of the members). This will fail because the link between the 2 data centers is down. The Quorum is per default defined as "more than half" which is in your case 3. No side can reach the Quorum (2 < 3). Then Galera will fall in split-brain state (non-primary) and refuse all queries (except SHOW and SET).

For the application it looks like the whole cluster is down.

Because of this concept with 2 locations or nodes it is not possible to have a reliable and available Cluster. We need always and odd number of nodes or data centers.

When the link comes back Galera often is capable to detect this and recovers itself. Or you have to bring one side out of split-brain manually.

bummi
  • 27,123
  • 14
  • 62
  • 101
shinguz
  • 121
  • 3