Poor NFS Performance Using Multiple Disks

Question

I have a server system running Ubuntu 12.10 with 12 disks attached. I am sharing all of these disks on my 10 gigabit network using NFSv4. However, I am getting generally poor performance over NFS compared to the performance I am able to get locally. The general solution that I have come across in my research for poor NFS performance is to use the async option in the exports file of the server instead of sync. However, this is simply not an option for my purposes. I understand that this will introduce a performance hit, but I would not expect to the extent that I am seeing.

I find that the more disks I actively use on the NFS client, the worse my per disk throughput is. For example, if I actively use only 1 disk, I am able to write at 60MB/s. However, if I actively use all 12 disks, I am able to only write at 12MB/s per disk. Equivalent local tests can yield 200MB/s per disk no problem. Are there perhaps some tweaks that can be made to optimize multiple disk NFS performance? It does not appear that either the CPU or memory are being utilized very much while the server is being actively used.

What underlying filesystem are you using? ZFS? If so, maybe the ZIL speed is the bottleneck here... — Paccc, Sep 18 '13 at 18:48
Also, if you temporarily disable sync, are you able to get full speed over the network? — Paccc, Sep 18 '13 at 18:49
I am using an MTU size of 9000. I have used both XFS and EXT4. If I disable sync, I am able to hit 10 gigabit line speed at about 100MB/s per disk. — Joe Swanson, Sep 19 '13 at 15:11
How are the disks connected to the server? Are you using RAID? Is the RAID hardware or software? — longneck, Sep 20 '13 at 16:47

Paccc · Accepted Answer · 2013-09-20T21:54:52.427

It looks like the sync writes are the culprit here, and unfortunately there isn't much you can do about it when synchronous writes are a requirement for the system.

The problem is due to the fact that the remote system that is writing data has to wait for the entire filesystem block to be written before writing the next one. With small block sizes this will be detrimental to performance, as you have seen.

There is no good solution to this problem, but here are some possible options to alleviate the bottleneck:

Increase the block size so that it is able to write more data per operation.
Get a separate fast SSD or NVRAM device for write caching/journaling. This will significantly improve your throughput for all workloads. This can be accomplished with ext4 using the tune2fs(8) command on Ubuntu, and adding an external journal device with the -J parameter.
Split the NFS share into one dedicated for sync writes and the other with async writes. This way you can put any non-critical data on the async share to improve the throughput for that workload independently.
Try a different filesystem that lets you do stable write caching natively. I use ZFS on FreeBSD on my SAN with an SSD backed intent log (equivalent to the journal on ext4). I have never tried ZFS on Linux but it appears to be a somewhat mature project now. Both my read and write throughput over iSCSI improved significantly after adding SSDs. I'm not sure of your familiarity with ZFS, but if you don't know, the purpose of the ZIL (ZFS Intent Log) is to provide a write cache on fast, stable storage such as an SSD. The log will periodically be committed to the disk in transaction groups in order to ensure the data is not lost, and in the event of a power outage, the writes can be replayed from the log to restore filesystem integrity.

I've encountered this issue in the past and have really found no good way to eliminate the problem entirely. If you discover any other ways to mitigate the problem please let me know!

Poor NFS Performance Using Multiple Disks

1 Answers1