The pool consist of two HDDs (WD Red 3 TB, 5200 RPM?, max transfer rate 147 MB/s & Verbatim (Toshiba) 3 TB, 7200 RPM) in raidz1-0
configuration. It has 2.25 TB of data, duplicated to two disks so the total amount is 4.5 TB. When I created the pool I did not specify an ashift
value.
zpool status
shows that "scan: scrub repaired 0 in 32h43m with 0 errors on Sun Jan 3 13:58:54 2021". This means that the scan speed was only 4.5e6 / (32.717 * 60 * 60) = 38.2 MB / s
. I'd expect at least 2 x 100 or up-to 2 x 200 MB/s, although the WD disk is somewhat slower than the other.
SMART-data of the disks shows that everything is healthy. They have 6.5 - 7 years of power-on time but the start-stop count is only about 200.
So the main question: What might explain the poor read performance?
Oddly zdb
showed that the pool uses the path /dev/disk/by-id/ata-WDC_WD30EFRX-xyz-part1
rather than /dev/disk/by-id/ata-WDC_WD30EFRX-xyz
. fdisk -l /dev/disk/by-id/ata-WDC_WD30EFRX-xyz
mentions that "Partition 1 does not start on physical sector boundary", but I read that it should only hurt write performance. I might try fixing this by removing the device and adding it back with the proper full-disk path, since the data is duplicated (and backed up).
The pool has 7.1 million files. I tested running sha1sum
on a 14276 MB file after clearing caches via /proc/sys/vm/drop_caches
, it took 2 min 41 s putting the read speed at 88.5 MB/s.
dd bs=1M count=4096 if=/dev/disk/by-id/ata-WDC_WD30EFRX-xyz of=/dev/null
reported a speed of 144 MB/s, using it on ata-WDC_WD30EFRX-xyz-part1
reported 134 MB/s and ata-TOSHIBA_DT01ACA300_xyz
reported 195 MB/s.
My NAS runs quite old software versions:
$ modinfo zfs
filename: /lib/modules/3.11.0-26-generic/updates/dkms/zfs.ko
version: 0.6.5.4-1~precise
license: CDDL
author: OpenZFS on Linux
description: ZFS
srcversion: 5FC0B558D497732F17F4202
depends: spl,znvpair,zcommon,zunicode,zavl
vermagic: 3.11.0-26-generic SMP mod_unload modversions
It has 24 GB of RAM, 8 GB of which is reserved for a JVM but the rest is free to be used. Although not that much of it seems to be free:
$ free -m
total used free shared buffers cached
Mem: 23799 21817 1982 0 273 1159
-/+ buffers/cache: 20384 3415
Swap: 7874 57 7817
Edit 1:
I did some tests with bonnie++
, using a single 4 GB file on the RAIDZ: write 75.9 MB/s, rewrite 42.2 MB/s and read 199.0 MB/s. I assume I did the conversion correctly from the "kilo-characters / second".
Ah, just now I realized that the parallel scrub takes as long as the slowest 5400 RPM disk, it doesn't matter that the 7200 RMP was (possibly) scrubbed faster.
Edit 2:
I reduced the number of files in the pool from 7.1 million to 4.5 million (-36.6%) and the scrub time was dropped from 32.72 hours to 16.40 hours (-49.9%). The amount of data is the same since I just put those small files into a low-compressed ZIP.
I also increased the recordsize
from 128k to 512k, no clue if this made a difference in this case. Other pre-existing data was not touched so they retain the original recordsize
. Oh and /sys/module/zfs/parameters/zfs_scan_idle
was set to 2
.