2

I have 2 HA NFS server (Ubuntu with CoroSync, Pacemaker & DRBD).

Everything is working beautifully with one small issue.

If I kill an NFS it fails over seamlessly (NICE!)

As the killed node is coming back up it causes a 5-10 second disconnection of the NFS share (presumably as its re-joining the cluster)

Anyone seen this? / any ideas on how to resolve it so that the experience is seamless as it rejoins the cluster.

  • You need to check the pacemaker documentation about stickiness – c4f4t0r Jul 07 '22 at 10:12
  • there is a default resource stickiness that _should_ be enough to keep things from failing back if all other things are in good shape. It sounds like you might be seeing a resource recovery... maybe you forgot to disable some cluster controlled services from starting at boot. – Matt Kereczman Jul 08 '22 at 16:39

2 Answers2

2

Try setting "wait_for_leasetime_on_stop" to "true".

BaronSamedi1958
  • 13,676
  • 1
  • 21
  • 53
-1

That sounds like Pacemaker is performing a recovery, possibly after finding the service is running on both nodes after the killed node rejoins.

If you see a message in /var/log/syslog from the pengine process mentioning something like, "active on 2 nodes attempting recovery", you should make sure you've disabled the nfs-server via systemd on both nodes:

systemctl disable nfs-server

Be sure to check the logs on both nodes.

Matt Kereczman
  • 1,899
  • 9
  • 12