0

I am writing a C program for Linux that reads and writes to files on an NFS server. The share is mounted hard; attempts to access it will block indefinitely until they work. Having my program block indefinitely is bad; it is still capable of doing useful work even if the files are unavailable. Remounting the share soft is not an option.

Having two processes, one of which does work and won't block, and another that handles file IO and might block is an option, but would constitute a Major Change. I'd like to avoid that. Really, I want to say, "I know that you're hard mounted so that naive programs can pretend you're a highly reliable local disk. But I know better and I am prepared to cope with any access failing, similar to the behavior if you were soft mounted." So:

In C, how can I access files on a hard-mounted NFS share, getting errors if the server is unavailable instead of blocking indefinitely?

I can run as root if necessary, but would prefer not. Using root to remount the share is right out. I can potentially rely on new features, but the further back support goes the better.

My research suggests the answer is that it's just not possible, but perhaps I've missed something.

Alan De Smet
  • 1,699
  • 1
  • 16
  • 20
  • 1
    Why not set up a timeout signal to interrupt the blocking call after a suitable timeout. Start the timer whenever you do a blocking call (`open()`, `fopen()`, `readdir()`, `chdir()`, `stat()`, etc.), and if the call returns with `errno==EINTR`, and your timeout signal handler set a flag, you can tell that the call didn't progress in the allotted time, and treat it as if the call had errored out. The added overhead of the timeout signal handling is well worth it; it's common practice in server software like Apache. Make the timeouts user-configurable, and you're set. – Nominal Animal Feb 09 '14 at 23:51
  • That's a potential workaround. Is this reliable? I believe that a unavailable NFS mount might put the process into uninterruptable sleep, which which the signal itself will block. – Alan De Smet Feb 10 '14 at 04:40
  • Ah. Prior to 2.6.25 kernels `nointr` mount option does make that reliable, but it is ignored on later kernels. `SIGKILL` is the only thing that works. So, you'd need to use a child process to access the mount, returning the descriptor (file or directory) via a Unix domain socket ancillary message. The original process uses a timeout to `KILL` the child if the operation takes too long. Using a long-lived minimal child that forks a sub-child for each operation should reduce the overheads to acceptable levels. Maybe parse `/proc/mounts` before each access to only work around NFS, too. – Nominal Animal Feb 10 '14 at 15:34
  • Actually, you'd have to have the child (or sub-child) provide the data stream instead, since the filesystem may become inaccessible at any time. Just providing the descriptor only protects opening, not accesses, from getting stuck in uninterruptible sleep. Also, the timeout would be slightly more complicated: one for initial access, and further ones for each individual data packet. – Nominal Animal Feb 10 '14 at 15:42

1 Answers1

3

You've not missed anything, you will never receive server unavailable errors because the kernel will never deliver them on hard mounted nfs mount points.

Because the hard option is a property of the mount point, you can't have applications that pick and choose because the kernel isn't set out to behave in that manner.

However, you do mention that you can run the application as root. Why not mount the file system somewhere else soft, and then get your anticipated behaviour?

Anya Shenanigans
  • 91,618
  • 3
  • 107
  • 122
  • Doing a soft-mount elsewhere is definitely a valid option for some environments! Sadly, I can't assume that the sysadmin will be okay with new mounts showing up. The goal is maximal Just Working with minimal surprises. Not possible is a valid answer given my constraints. Accepted. – Alan De Smet Feb 10 '14 at 04:37