2

I'm running 2 node cluster with DRBD (active/passive) managed by the drbd systemd service and a small script that mounts volumes whenever a resource becomes primary.

I want to configure the DRBD in such a way that it will always resolve any split brain and always have at least one node which is primary and able to serve in any case as long as not both machines are down.

I tried the following configuration (where pri-lost-after-sb is "reboot")

after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;

and on-suspended-primary-outdated force-secondary and some other combinations.

But I always find a scenario where the cluster gets into bad states and doesn't recover from a split brain. Usually I'm getting StandAlone on the nodes and force-io-failures on the secondary (so after another fail of the primary, this secondary will not work even if connected).

Is there anything else I can do to improve the robustness of this setup considering I highly prioritize service uptime and not the avoidance of data loss?

eagr
  • 121
  • 3

0 Answers0