4

I am testing a ZFS mirror using FIO (Flexible I/O tester) to understand the random read scalability of ZFS mirrors. The primary and secondary caches have been set to none as the application I use performs it's own caching.

For testing purposes I am using magnetic disks /dev/sdb and /dev/sdc which have ~100 random read IOPS. The single disk figures were obtained from FIO when using a single disk ZFS mount.

My understanding the ZFS mirror should experience approximately 200 (100 + 100) random read IOPS. When testing though, I am experiencing only around 140 random read IOPS. The full results are below:

test@pc:/mnt/zfs-raid1# fio --name=randread --ioengine=libaio --iodepth=16 --rw=randread --bs=4k --direct=0 --size=512M --numjobs=8 --runtime=240 --group_reporting

randread: (groupid=0, jobs=8): err= 0: pid=4293: Wed Nov 16 21:02:08 2016
  read : io=137040KB, bw=584482B/s, iops=142, runt=240091msec
    slat (usec): min=222, max=2246.9K, avg=56047.94, stdev=85252.98
    clat (usec): min=2, max=5142.9K, avg=838922.05, stdev=443521.12
     lat (msec): min=5, max=5401, avg=894.97, stdev=460.94
    clat percentiles (msec):
     |  1.00th=[   75],  5.00th=[  269], 10.00th=[  396], 20.00th=[  529],
     | 30.00th=[  619], 40.00th=[  693], 50.00th=[  766], 60.00th=[  848],
     | 70.00th=[  947], 80.00th=[ 1090], 90.00th=[ 1336], 95.00th=[ 1614],
     | 99.00th=[ 2507], 99.50th=[ 2835], 99.90th=[ 3720], 99.95th=[ 3884],
     | 99.99th=[ 4621]
    bw (KB  /s): min=    1, max=  851, per=12.92%, avg=73.67, stdev=43.13
    lat (usec) : 4=0.01%, 10=0.01%
    lat (msec) : 10=0.11%, 20=0.05%, 50=0.34%, 100=0.85%, 250=3.16%
    lat (msec) : 500=12.49%, 750=30.99%, 1000=26.12%, 2000=23.48%, >=2000=2.38%
  cpu          : usr=0.02%, sys=0.14%, ctx=99221, majf=0, minf=202
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.2%, 16=99.6%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=34260/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: io=137040KB, aggrb=570KB/s, minb=570KB/s, maxb=570KB/s, mint=240091msec, maxt=240091msec

The mirror was created using:

zpool create zfs-raid1 mirror /dev/sdb /dev/sdc

Is this the level of scalability expected? Or is there something I am missing?

Greg
  • 1,657
  • 5
  • 27
  • 38
  • Try with enabled ARC. – mzhaase Nov 16 '16 at 10:54
  • It's much better with ARC however I am focusing on raw disk performance as the application layer does it's own (optimized) caching – Greg Nov 16 '16 at 11:01
  • Your application doesn't cache metadata. – jlliagre Nov 16 '16 at 11:28
  • 2
    ARC is an integral part of ZFS, you cannot expect good performance if you disable it completely. All the metadata is cached in ARC, and has to read from disk every time if you don't have cache. Performance enhancements like prefetch don't work without it. At least set primarycache to metadata. – mzhaase Nov 16 '16 at 11:30
  • You're right, that was quite silly of me. Caching metadata only results in a 2x performance. If you write the answer I will accept it. – Greg Nov 16 '16 at 11:48

1 Answers1

3

ZFS uses ARC not just for file caching, but for many performance optimizations, such as prefetch and, probably most importantly, metadata. If you have no cache, ZFS has to read the metadata from the pool every single time it needs it, which happens to be every read or write.

You can cache only metadata by setting primarycache=metadata instead of primarycache=all.

However, ARC and application level cache do not have to be mutually exclusive. Prefetch might help speed things up as well. As such, I would try out how the performance changes with primarycache=all.

This article might also be of interest: https://www.patpro.net/blog/index.php/2014/03/19/2628-zfs-primarycache-all-versus-metadata/

mzhaase
  • 3,798
  • 2
  • 20
  • 32