I am planning to use ZFS for backups. 5-10 servers will "stream" updates via DRBD to very large files (500 gigabytes each) on ZFS file system.
The servers will generate about 20 megabytes per second of random writes about 100 MBps total. I don't read these files so the pattern should be almost 100% writes.
For me copy on write is a very important feature.
As i understand COW should transform random writes to sequential writes. But this is not happening.
I tested on a server with 12 SAS drives E5520 XEON (4 core) and 24 GB RAM and random write was very low.
I decided to debug it first on 1 SAS HDD on the same server.
I created EXT4 file system and did some tests:
root@zfs:/mnt/hdd/test# dd if=/dev/zero of=tempfile bs=1M count=4096 conv=fdatasync,notrunc 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 30.2524 s, 142 MB/s
So I can see write speed is about 140 MBps.
Random writes ~ 500 KBps ~100-150 iops. Which is normal.
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=1 --size=4G --readwrite=randwrite test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 fio-2.1.11 Starting 1 process bs: 1 (f=1): [w(1)] [0.6% done] [0KB/548KB/0KB /s] [0/137/0 iops] [eta 02h:02m:57s]
Then on the same drive I created ZFS:
zpool create -f -m /mnt/data bigdata scsi-35000cca01b370274
I set record size 4K because I will have 4K random writes. Record size 4K worked better than 128k when I was testing.
zfs set recordsize=4k bigdata
Tested random writes to 4G files.
fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=./test --filename=test --bs=4k --iodepth=1 --size=4G --readwrite=randwrite ./test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 fio-2.1.11 Starting 1 process ./test: Laying out IO file(s) (1 file(s) / 4096MB) Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/115.9MB/0KB /s] [0/29.7K/0 iops] [ [eta 00m:00s]
Looks like COW did well here 115.9MB per sec.
Tested random writes to 16G files.
fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=./test16G --bs=4k --iodepth=1 --size=16G --readwrite=randwrite test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 fio-2.1.11 Starting 1 process bs: 1 (f=1): [w(1)] [0.1% done] [0KB/404KB/0KB /s] [0/101/0 iops] [eta 02h:08m:55s]
Very poor results 400 kilobytes per second.
Tried the same with 8G files:
fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=./test8G --bs=4k --iodepth=1 --size=8G --readwrite=randwrite test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 fio-2.1.11 Starting 1 process test: Laying out IO file(s) (1 file(s) / 8192MB) bs: 1 (f=1): [w(1)] [14.5% done] [0KB/158.3MB/0KB /s] [0/40.6K/0 iops] [eta 00m:53s]
At the beginning COW was fine 136 megabytes per second.
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdg 0.00 0.00 0.00 1120.00 0.00 136.65 249.88 9.53 8.51 0.00 8.51 0.89 99.24
But at the end when test reached 90% write speed went down to 5 megabyte per second.
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdg 0.00 0.00 0.00 805.90 0.00 5.33 13.54 9.95 12.34 0.00 12.34 1.24 100.00
So 4G files are fine, 8G almost fine but 16G files are not getting any COW.
Don't understand what is happening here. Maybe memory caching plays role here.
OS: Debian 8 ZFS ver 500. No compression or deduplication.
zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT bigdata 1.81T 64.4G 1.75T - 2% 3% 1.00x ONLINE - root@zfs:/mnt/data/test# zdb bigdata: version: 5000 name: 'bigdata' state: 0 txg: 4 pool_guid: 16909063556944448541 errata: 0 hostid: 8323329 hostname: 'zfs' vdev_children: 1 vdev_tree: type: 'root' id: 0 guid: 16909063556944448541 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 8547466691663463210 path: '/dev/disk/by-id/scsi-35000cca01b370274-part1' whole_disk: 1 metaslab_array: 34 metaslab_shift: 34 ashift: 9 asize: 2000384688128 is_log: 0 create_txg: 4 features_for_read: com.delphix:hole_birth com.delphix:embedded_data zpool status bigdata pool: bigdata state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM bigdata ONLINE 0 0 0 scsi-35000cca01b370274 ONLINE 0 0 0 errors: No known data errors
fio doesn't work with O_DIRECT on ZFS I had to run without it. As I understand it should produce even better results. But it is not happening.
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=./test --filename=test16G --bs=4k --iodepth=1 --size=16G --readwrite=randwrite ./test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1 fio-2.1.11 Starting 1 process fio: looks like your file system does not support direct=1/buffered=0 fio: destination does not support O_DIRECT