2

I've got many automount entries in LDAP for mounting ~200 nodes between each other. Often, the automounter stops working. After debugging, I figured out that it can be fixed by running

rm -f /etc/mtab~*
restart autofs

which makes me think, that there are mount.nfs processes that somehow fail to remove the lock files (they are sometimes called /etc/mtab~.[0-9]*), which prevents the following mount requests to succeed. As a workaround I have a cron job removing the lock files, but it is sometimes too late.

Now the details:

System: Linux 2.6.32-33-server #70-Ubuntu SMP Thu Jul 7 22:28:30 UTC 2011 x86_64 GNU/Linux
automount -V: 5.0.4
mount.nfs -V: linux nfs-utils 1.1.6

LDAP entries (relevant lines):

objectClass: automount
cn: myhost
automountInformation: -soft myhost:/var/tmp

a successful mount looks like this (mount | grep auto):

myhost:/var/tmp on /var/autofs/net/myhost type nfs (rw,soft,sloppy,addr=10.x.x.x)

whenever I access a new mountpoint, automounter spawns following processes (ps .. | grep mount)

/bin/mount -t nfs -s -o soft -f myhost:/var/tmp /var/autofs/net/myhost
/sbin/mount.nfs myhost:/var/tmp /var/autofs/net/myhost -s -f -o rw,soft

if I try to run the command manually, I get following message:

Cannot create link /etc/mtab~
Perhaps there is a stale lock file?

Every following request to the automounter then fails.

My question now is how to make automounter behave correctly in the case of mount.nfs processes failing to remove the mtab~ lock file to allow all my hosts mount each other? Is it a matter of LDAP, NFS, automounter or mount.nfs options?

Please help me to get behind this problem!

aszorro
  • 21
  • 1
  • 4
  • The number suffixes are almost certainly PIDs. So you know the PID of the stuck mountd, probably. Try running a system call tracer on them to find out what's up with them. – James Youngman Mar 03 '12 at 22:27
  • That was also my guess, but there are no processes with that PIDs (and also no other "mount" instances in the memory). – aszorro Mar 03 '12 at 22:59
  • In that case I would fall back on generic troubleshooting; for example running the system call tracer on automountd to find out why it appears to be stuck. – James Youngman Mar 05 '12 at 10:34

0 Answers0