I am running ZFS on Ubuntu 20.04 LTS ontop of hyper-v. At random, currently every 2-3 days, ZFS gets stuck. Samba is non-responsive, File I/O via smb to the ZFS volume ceases and actions on the shell on the volume makes the shell freeze. Ubuntu is still responsive, only ZFS gets stuck.
Syslog:
Oct 31 13:21:07 zfs kernel: [160466.002788] INFO: task z_rd_int:1200 blocked for more than 120 seconds. Oct 31 13:21:07 zfs kernel: [160466.002793] Tainted: P O 5.4.0-52-generic #57-Ubuntu Oct 31 13:21:07 zfs kernel: [160466.002793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 31 13:21:07 zfs kernel: [160466.002795] z_rd_int D 0 1200
2 0x80004000 Oct 31 13:21:07 zfs kernel: [160466.002797] Call Trace: Oct 31 13:21:07 zfs kernel: [160466.002804] __schedule+0x2e3/0x740 Oct 31 13:21:07 zfs kernel: [160466.002806] schedule+0x42/0xb0 Oct 31 13:21:07 zfs kernel: [160466.002807] io_schedule+0x16/0x40 Oct 31 13:21:07 zfs kernel: [160466.002809] rq_qos_wait+0x106/0x180 Oct 31 13:21:07 zfs kernel: [160466.002811] ? __wbt_done+0x40/0x40 Oct 31 13:21:07 zfs kernel: [160466.002812] ? sysv68_partition+0x2d0/0x2d0
[...]
Oct 31 13:21:07 zfs kernel: [160466.003297] INFO: task z_wr_iss:1206 blocked for more than 120 seconds. Oct 31 13:21:07 zfs kernel: [160466.003300] Tainted: P O 5.4.0-52-generic #57-Ubuntu Oct 31 13:21:07 zfs kernel: [160466.003301] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 31 13:21:07 zfs kernel: [160466.003302] z_wr_iss D 0 1206
2 0x80004000 Oct 31 13:21:07 zfs kernel: [160466.003304] Call Trace: Oct 31 13:21:07 zfs kernel: [160466.003306] __schedule+0x2e3/0x740 Oct 31 13:21:07 zfs kernel: [160466.003308] schedule+0x42/0xb0 Oct 31 13:21:07 zfs kernel: [160466.003309] io_schedule+0x16/0x40 Oct 31 13:21:07 zfs kernel: [160466.003310] rq_qos_wait+0x106/0x180 Oct 31 13:21:07 zfs kernel: [160466.003312] ? __wbt_done+0x40/0x40 Oct 31 13:21:07 zfs kernel: [160466.003313] ? sysv68_partition+0x2d0/0x2d0
In HTOP you can see that there are 8 tasks that are hanging in D-State.
I need to reboot the whole VM to get ZFS to respond again.
Is this phenomenon known? what can be done to prevent ZFS from hanging? Some info:
Ubuntu 20.04LTS running ontop of Hyper-V
16 Gigs RAM, 16 V-Cores (3950x with ECC RAM).
ZFS Volume is a ZFS-Z1 array with 8 x 5TB drives. Compression is LZ4, no dedup
I tuned L1ARC to only use 6GB of RAM. You can see in the screenshot that the VM still has 5 GB of free RAM when the freeze occurs. So I don't think its an OOM szenario.
Other than the L1Arc size, "xattr=sa" and "sync=never" (I know....) no tuning has been done, everything out of the box.