NFS Server - Corosync, DRBD & Pacemaker

Question

I have 2 HA NFS server (Ubuntu with CoroSync, Pacemaker & DRBD).

Everything is working beautifully with one small issue.

If I kill an NFS it fails over seamlessly (NICE!)

As the killed node is coming back up it causes a 5-10 second disconnection of the NFS share (presumably as its re-joining the cluster)

Anyone seen this? / any ideas on how to resolve it so that the experience is seamless as it rejoins the cluster.

You need to check the pacemaker documentation about stickiness — c4f4t0r, Jul 07 '22 at 10:12
there is a default resource stickiness that _should_ be enough to keep things from failing back if all other things are in good shape. It sounds like you might be seeing a resource recovery... maybe you forgot to disable some cluster controlled services from starting at boot. — Matt Kereczman, Jul 08 '22 at 16:39

score 2 · Answer 1 · answered Dec 19 '22 at 05:38

2

Try setting "wait_for_leasetime_on_stop" to "true".

answered Dec 19 '22 at 05:38

BaronSamedi1958

13,676
1
21
53

score -1 · Answer 2 · answered Jul 08 '22 at 16:45

That sounds like Pacemaker is performing a recovery, possibly after finding the service is running on both nodes after the killed node rejoins.

If you see a message in /var/log/syslog from the pengine process mentioning something like, "active on 2 nodes attempting recovery", you should make sure you've disabled the nfs-server via systemd on both nodes:

systemctl disable nfs-server

Be sure to check the logs on both nodes.

NFS Server - Corosync, DRBD & Pacemaker

2 Answers2