9

I am having very inconsistent performance with NFS between two wheezy machines, and I can't seem to nail it down.

Setup:

Machine 1 'video1': Dual 5506 w/12GB ram, XFS on 8x3TB RAID6 exported as 'video1' from '/mnt/storage'

Machine 2 'storage1': Phenom X2 @ 3.2Ghtz w/8GB ram, ZFS on 5x2TB exported as 'storage1' from /mnt/storage1-storage

Local write performance:

mackek2@video1:/mnt/storage/testing$ dd if=/dev/zero of=localwrite10GB bs=5000k count=2000
2000+0 records in
2000+0 records out
10240000000 bytes (10 GB) copied, 16.7657 s, 611 MB/s

Local read performance:

Both are connected to the same HP gigabit switch, and iperf gives rock solid 940mbps both ways.

My problem is that when I write to the video1 export from storage1, performance is all over the place. It seems for the first few (5-7) gigs of file transfer (I'm hoping to move around 30-120GB AVCHD or MJPEG files as quickly as possible), performance goes from 900mbps, down to 150-180mbps, so times as slow as 30mbps. If I restart the NFS kernel server, performance picks back up for a few more gigs.

mackek2@storage1:/mnt/video1/testing$ dd if=/dev/zero of=remoteWrite10GB count=2000 bs=5000K
2000+0 records in
2000+0 records out
10240000000 bytes (10 GB) copied, 223.794 s, 45.8 MB/s
mackek2@storage1:/mnt/video1/testing$ dd if=/dev/zero of=remoteWrite10GBTest2 count=2000 bs=5000K
2000+0 records in
2000+0 records out
10240000000 bytes (10 GB) copied, 198.462 s, 51.6 MB/s
mackek2@storage1:/mnt/video1/testing$ dd if=/dev/zero of=bigfile776 count=7000 bs=2000K
7000+0 records in
7000+0 records out
14336000000 bytes (14 GB) copied, 683.78 s, 21.0 MB/s
mackek2@storage1:/mnt/video1/testing$ dd if=/dev/zero of=remoteWrite15GB count=3000 bs=5000K
3000+0 records in
3000+0 records out
15360000000 bytes (15 GB) copied, 521.834 s, 29.4 MB/s

When things are going fast, nfsiostat on the client gives average RTTs of a few ms, but it shoots up to over 1.5seconds RTT as soon as performance drops. Additionally, the CPU queue depth jumps up to over 8 while the write is happening.

Now, when reading from the same export, I get beautiful 890Mbps give or take a few mbps for the entire read.

mackek2@storage1:/mnt/video1/testing$ dd if=remoteWrite10GBTest2 of=/dev/null
20000000+0 records in
20000000+0 records out
10240000000 bytes (10 GB) copied, 89.82 s, 114 MB/s
mackek2@storage1:/mnt/video1/testing$ dd if=remoteWrite15GB of=/dev/null
30000000+0 records in
30000000+0 records out
15360000000 bytes (15 GB) copied, 138.94 s, 111 MB/s

The same thing happens the other way around with storage1 as the NFS server. CPU queue jumps up, speeds drop to crap, and I pull my hair out.

I have tried increasing the number of NFS daemons to as many as 64, and it still sputters out after a few gigs.

Kyle M
  • 93
  • 1
  • 4
  • ... async option on the export fixed it. Now to see if I can eek out a few more mbps with jumbo frames. – Kyle M Jul 23 '12 at 17:52
  • 1
    Please post this as an answer and accept it when you can so that others know that the issue is solved. – mgorven Jul 24 '12 at 07:09

1 Answers1

4

You don't include your mount or export options, so there's a number of things with NFS that could be impacting performance. I'd recommend trying the following options for maximum NFS performance and reliability (based on my experiences):

  • Mount Options: tcp,hard,intr,nfsvers=3,rsize=32768,wsize=32768

  • Export Options: async

Christopher Cashell
  • 9,128
  • 2
  • 32
  • 44
  • Just a note, too, the mount options above are also the ones recommended by Oracle for running Oracle across NFS. – Christopher Cashell Oct 09 '12 at 22:19
  • i was able to also improve performance significantly with the above tip, however `nfsvers=3` wasn't necessary. – anarcat Jul 24 '14 at 18:31
  • @anarcat - For modern releases, you're correct, `nfsvers=3` won't matter as much. Linux NFS mounts used to default to NFS version 2, unless you specified version 3. In those cases, it was very worthwhile to explicitely set the version. In modern releases, mount will negotiate, starting with v4, then trying v3, then falling back to v2. – Christopher Cashell Jul 25 '14 at 16:23