1

Anyone got bestpractices for how to mount a nfsshare from Sun(Oracle) Unified Storage? We run the usual hard and nfs4 on debian squeeze. We run our VMs through this NFS share with Xen. When our SAN on friday losed a disk and started resilvering(rebuilding), all the nfs shares stailed and one of our Dom0 crashed quite bad with the nfs share making a lot of vms go down. Is there any mount options which makes this more seamlessly errorful?

hesten
  • 11
  • 2

2 Answers2

1

I don't know much about Debian and neither about NFSv4.

But if the mount options are still the same as with NFSv3 my favorites (for any nfs-client-mount an any os) are:

  • hard (so keep retrying without exponential backoff of retry times)
  • bg (keep trying in the background without stopping anything "behind" the mount)
  • intr (if you really want - you can kill the mount without rebooting your client)

rsize and wsize are tuned up to reasonable sizes per default nowadays - look at your lokcal man-page.

I used to use "wsize=32768,rsize=32768" to get better tranfer rates before that.

You also have to take care on the nfs-server-side (if NFSv4 is still the same here as NFSv3):

  1. First start all NFS-Services
  2. Last start the Service-IP for the NFS-Services

Else the client will try to reconnect against an empty "nfs-service" and will fail instead of retry.

BTW - what has SAN to do (in this case) with Sun Unified Storage? What happened when you "lost" your SAN? Why did the rebuild process break things? Was the storage not redundant?

Nils
  • 7,695
  • 3
  • 34
  • 73
  • FWIW, Linux no longer supports the intr option. From nfs(5): "The intr / nointr mount option is deprecated after kernel 2.6.25. Only SIGKILL can interrupt a pending NFS operation on these kernels and if specified, this mount option is ignored to provide backwards compatibility with older kernels." – janneb Jun 13 '11 at 18:40
1

I was having a similar problem with XenServer not long ago and I did some research on it. Apparently for some reasons XenServer uses soft mounts with relatively short short timeout for their NFS mounts. Some, people suggested modifying the mount script directly on xen server, since mount options are not configurable in any other way. Apparently thisis the only way. We do not have this problem anymore since we 100% on vmware now and it is more resilient to NFS slowdowns.

The actual problem however lays with reduced write performance of underlaying storage and it highly depends on your RAID controller (i.e. how much performance degrades during rebuilds). You can try to play with priority settings of array rebuilds, however it made no difference on my controller (Adaptec 5085). You can potentially improve the situation a little by buying more memory for the NFS server. This way NFS daemon will write only journal entry but it will keep data in the FS cache till better times, but again it may or may not help depending on your situation.

I also noticed that this problem more often occurs on storage with parity (i.e. RAID-5 and RAID-6), so we try to use mirrored storage for our virtual machines whenever possible.

dtoubelis
  • 4,677
  • 1
  • 29
  • 32