I have huge performance issues using MongoDB (i believe it is mmapped DB) with ZFSonlinux.
Our Mongodb is almost only writes. On replicas without ZFS, disk is completely busy for ~5s spikes, when app writes into DB every 30s, and no disk activity in between, so i take that as the baseline behaviour to compare.
On replicas with ZFS, disk is completely busy all the time, with the replicas stuggling to keep up to date with the MongoDB primary. I have lz4 compression enabled on all replicas, and the space savings are great, so there should be much less data to hit the disk
So on these ZFS servers, i first had the default recordsize=128k. Then i wiped the data and set recordsize=8k before resyncing Mongo data. Then i wiped again and tried recordsize=1k. I also tried recordsize=8k without checksums
Nevertheless, it did not solved anything, disk was always kept a 100% busy. Only once on one server with recordsize=8k, the disk was much less busy than any non-ZFS replicas, but after trying different setting and trying again with recordsize=8k, disk was 100%, i could not see the previous good behaviour, and could not see it on any other replica either.
Moreover, there should be almost only writes, but see that on all replicas under different settings, disk is completely busy with 75% reads and only 25% writes
(Note, i believe MongoDB is mmapped DB. I was told to try MongoDB in AIO mode, but i did not find how to set it, and with another server running MySQL InnoDB i realised that ZFSonLinux did not support AIO anyway.)
My servers are CentOS 6.5 kernel 2.6.32-431.5.1.el6.x86_64. spl-0.6.2-1.el6.x86_64 zfs-0.6.2-1.el6.x86_64
#PROD 13:44:55 root@rum-mongo-backup-1:~: zfs list
NAME USED AVAIL REFER MOUNTPOINT
zfs 216G 1.56T 32K /zfs
zfs/mongo_data-rum_a 49.5G 1.56T 49.5G /zfs/mongo_data-rum_a
zfs/mongo_data-rum_old 166G 1.56T 166G /zfs/mongo_data-rum_old
#PROD 13:45:20 root@rum-mongo-backup-1:~: zfs list -t snapshot
no datasets available
#PROD 13:45:29 root@rum-mongo-backup-1:~: zfs list -o atime,devices,compression,copies,dedup,mountpoint,recordsize,casesensitivity,xattr,checksum
ATIME DEVICES COMPRESS COPIES DEDUP MOUNTPOINT RECSIZE CASE XATTR CHECKSUM
off on lz4 1 off /zfs 128K sensitive sa off
off on lz4 1 off /zfs/mongo_data-rum_a 8K sensitive sa off
off on lz4 1 off /zfs/mongo_data-rum_old 8K sensitive sa off
What could be going on there ? What should i look to figure out what ZFS is doing or which setting is badly set ?
EDIT1:
hardware: These are rented servers, 8 vcores on Xeon 1230 or 1240, 16 or 32GB RAM, with zfs_arc_max=2147483648
, using HP hardware RAID1. So ZFS zpool is on /dev/sda2 and does not know that there is an underlying RAID1. Even being a suboptimal setup for ZFS, i still do not understand why disk is choking on reads while DB does only writes.
I understand the many reasons, which we do not need to expose here again, that this is bad and bad, ... for ZFS, and i will soon have a JBOD/NORAID server which i can do the same tests with ZFS's own RAID1 implementation on sda2 partition, with /, /boot and swap partitions doing software RAID1 with mdadm.