Pacemaker split-brain timing of resource start and stop

Question

I am testing a two-node cluster and testing various split-brain scenarios.

It is all working fine, with resources doing what I would expect, except I am wondering about timing in a particular case.

I am running with

auto_tie_breaker: 1
auto_tie_breaker_node: lowest

lowest would be node 1 in my case.

In the case where the resources are on node 1 and I cause a split-brain scenario everything seems fine.

However, when the resources are on node 2, and I cause a split-brain, the resources are shutdown on node 2 and started on node 1, which makes perfect sense.

But I am wondering about the timing. What is the mechanism, now that the two nodes cannot communicate with each other, that makes sure the resource on node 1 starts after the resource on node 2 shutdowns?

I can imagine putting a sleep in the start up script for the resource, but I was wondering if there is an officially endorsed way to handle this timing?

Or perhaps I am going about this the wrong way. Perhaps I should use some sort of stickiness. Telling the nodes to continue what they were doing when a split-brain condition is detected. In other words, on split-brain detected do not change anything.

The potential timing issue I am worried about in my post does not occur if I adopt "on split-brain" do nothing. I brought up the question due to my current config described above and watching how it behaved. Also I saw a fair number of configuration examples using the above config, but I am starting not to like it for the very reason that it creates this window of uncertainty. — user3718260, Sep 20 '19 at 00:09

score 2 · Answer 1 · answered Sep 23 '19 at 07:37

tl;dr -- get a third vote for your cluster via corosync-qdevice

The problem with the OP (I am the author of the OP) is the rather large amount of details that are not contained in the original question.

Upon another couple days of testing and reading it is clear to me that there are plenty of wrong ways to build a two node HA cluster described on the Internet.

It should be pretty easy to understand the problem with my first question, the one about stopping the service on one node, while starting it on the other. By definition the nodes are in a split-brain situation and cannot know the state of the other, so no amount of delay or waiting will fix the situation with a high degree of reliability. Further once the cluster is in split-brain, node 2 only thinks the service is starting on node 1. It has no idea if it does. And thus when node 2 goes off-line the service may then be unavailable.

The second point ends with suggesting that perhaps some sort of token system would work. In other words, do nothing. It works less than optimally because if the system with the token dies, split brain ensues and a perfectly good standby sits idle because it chooses to do nothing.

I finally settled on a third party quorum device, which is only slightly different from having a 3 node or odd numbered cluster. I used the corosync-qdevice and corosync-qnetd.

I now have a third vote in my two node cluster. It solves the split-brain problems referred to in my OP and eliminates all the impossible to solve two-node edge cases.

That leave STONITH, the mighty hammer that all cluster gods wield. If I was to build a cluster from scratch I would use STONITH in addition to my resource blocking.

However, I inherited this particular cluster and it is on shared hardware and while it does have IPMI and other goodies I am not allowed to turn the power off.

As to virtual STONITH I am not convinced that is really any different than resource blocking, but that should be a different question.

So if you are here while struggling to setup an HA two-node cluster with resource fenching, read up on corosync-qdevice and add a third vote to your cluster.

score -1 · Answer 2 · answered Sep 18 '19 at 17:50

-1

Ideally you would configure and enable STONITH devices for Pacemaker to use. Otherwise, there is no guarantee that services will actually stop on node 2 before node 1 attempts to take over. A sleep/delay might give the service extra time to stop, but if it fails to stop for any reason, node 1 will still attempt to take over services.

STONITH devices are configure to use an out of band communication channel to forcibly remove an unresponsive or disconnected peer from the cluster (usually by powering it off).

Once you've implemented STONITH in a Pacemaker cluster you have eliminated nearly all cases where you could be left with a hung/split cluster.

answered Sep 18 '19 at 17:50

Matt Kereczman

1,899
9
12

1

I like STONITH, but in my test case, STONITH is not an option. I am stuck with resource fencing. – user3718260 Sep 20 '19 at 00:01
Unfortunately, there are some scenarios you simply cannot account for without STONITH. This is going to be one of them. Why is STONITH not an option? – Matt Kereczman Sep 20 '19 at 17:50

Pacemaker split-brain timing of resource start and stop

2 Answers2