3

I have a server A which stores files, and a server B with an NFS-v3 mount to server A. When server A fails to respond for any reason, any read request made on B will hang during a long time (several minutes).

I need that these requests to fail very quickly (1-2 seconds).

I tried to tweak parameters such as timeo, retrans, retry, soft/hard, sync/async, etc but nothing works well and it seems to be a known not-solved problem. I always get a very long timeout.

  1. Is there any fix for NFS client, or any alternative userland client?

  2. Is there any network protocol for file sharing (other than NFS) which properly handle broken connections, and unavailable servers?

Mmmh mmh
  • 103
  • 5
Antares
  • 191
  • 1
  • 2
  • 14

2 Answers2

1

NFS is a pretty solid protocol, especially when dealing with smaller setups (20 servers or less). I would use soft mounts to avoid issues when server A fails. If you want to quickly disconnect the mount, a quick play with iptables should cause the connection to timeout fairly quickly and allow you to umount (assuming 10.10.0.1 is the IP of your NFS server A):

iptables -A OUTPUT -d 10.10.0.1 -j REJECT

There are several other file sharing protocols out there, but none as ubiquitous as NFS, IHMO.

vmfarms
  • 3,117
  • 20
  • 17
  • Yes, but your solution needs to manually do the trick. I'd like a fully automated stuff, with auto-remount when server A comes back. I am currently giving a try to remoteFs, based on FUSE, and it seems to perfectly fit my needs : when connection to A is broken, a request on B fails in 250ms. And when A comes back, it's instantly working from B. – Antares Aug 15 '10 at 16:27
  • Maybe I spoke too quickly : remoteFs returns quickly only when the remote server is alive (responding to TCP/IP but without RFS server launched). If the machine is fully down (not responding), the request returns after 20sec. I continue my tests ... – Antares Aug 15 '10 at 16:37
  • How about trying sshfs (http://fuse.sourceforge.net/sshfs.html). It uses FUSE and seems to work well for me certain circumstances. It's resilient enough to auto-reestablish when the connection is dropped and reconnected and can be easily disconnected when the other end is dead. – vmfarms Aug 15 '10 at 17:14
1

Did you tried afs or gluster ?

(@vmfarms sshfs it's a good solution if you don't mind the perfomance issue. It's adding too much overhead to your network)

Nikolaidis Fotis
  • 2,032
  • 11
  • 13
  • I didn't tried AFS or Gluster. I tried sshfs and it seems to have the same problem than remoteFs : a command like 'ls /my/temote/mount' does take around 20sec when the remote server is down. But I modified the source code and set up a socket timeout to 1sec, and with this patch, it works great for now. Thanks for your responses ! – Antares Aug 18 '10 at 18:22