We have a High Availability NFS environment using DRBD, heartbeat and nfs exposed to clients (Simular to the following https://help.ubuntu.com/community/HighlyAvailableNFS ). This seems to be a rather common and well supported method of doing HA nfs and is working really well for us with one expection.
When the heartbeat performs the switch over, the nfs clients all hang for approx 60-120 seconds. I can see that it is only taking 5-10 secs for heartbeat to complete the takeover and get the nfs up (I can even mount it manually). But the connected clients seem to wait for somesort of timeout before they re-establish a working connection.
I've tried the following without success;
- Insured that /var/lib/nfs is stored on the drdb disk and symlinked back to /var/lib
- UDP or TCP client connections
- NFS server export defines the fsid in the export.
- Playing with client timeo= in mount
- Hard/Soft mounts
Setup is as follows;
- NFSv4
- Ubuntu LTS servers and clients
- Current Client Mount options=proto=tcp,noauto,bg,intr,hard,noatime,nodiratime,nosuid,noexec
Notes
- I've noticed that /var/lib/nfs/rmtab is always empty and i cant work out why. Could this be the reason?
- Clients are GUI less ubuntu 10.4 LAMP stack Servers.
- When the client stalls, any program which tries to access the share stalls. E.g. doing a "df" will hang the ssh session at the nfs mount line until the nfs comes back.
Any advise would be most welcome.