2

I have three disks making up a RAID-Z vdev in a zfs pool on Ubuntu Server 16.04.2 which are connected via a cheap PCIE SATA card, a single eSATA cable, and a port multiplier at the other end.

iostat shows these disks are performing extremely poorly as per the below: enter image description here

But I'm struggling to understand why. Both the controller (a Syba SI-PEX40064) and the port multiplier (an unbranded one with a SiI3726 chipset) support port multiplication and FIS.

If it was a single disk failing I would expect the wait time to be slow only on a single disk, not all three attached via the port multiplier.

These disks are relatively newly installed in this configuration (2-3 weeks) and this issue has only occurred in the last few hours despite constant use of the pool. I'm not sure how ZFS works, I suppose it's possible it wasn't writing to those disks until now?

Any suggestions on what to investigate, or potential factors that may result in this would be greatly appreciated!

DD speed test

root@server:~# dd if=/dev/sdi of=/dev/null bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.00211904 s, 4.9 GB/s
root@server:~# dd if=/dev/sdi of=/dev/null bs=1M count=10 iflag=direct
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 12.5821 s, 833 kB/s
root@server:~# dd if=/dev/sdj of=/dev/null bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.00196007 s, 5.3 GB/s
root@server:~# dd if=/dev/sdj of=/dev/null bs=1M count=10 iflag=direct
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 11.6849 s, 897 kB/s
root@server:~# dd if=/dev/sdk of=/dev/null bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 40.8416 s, 257 kB/s
root@server:~# dd if=/dev/sdk of=/dev/null bs=1M count=10 iflag=direct
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 6.79282 s, 1.5 MB/s
Sam Martin
  • 2,044
  • 2
  • 13
  • 10
  • 1
    Can you read from the raw disks at higher speed? For example, what do you get executing `dd if=/dev/sdh of=/dev/null bs=1M count=512`, with and without the additional `oflag=direct` parameter? – shodanshok Jul 02 '17 at 14:10
  • 133MB/sec! So yes, though `oflag=direct` doesn't seem to be supported by /dev/null as I get `Invalid Argument` – Sam Martin Jul 02 '17 at 14:16
  • 1
    Sorry, I means **`iflag=direct`**. Run these two dd commands on all three disks and report back here. – shodanshok Jul 02 '17 at 14:21
  • @shodanshok it's now cached it I'm guessing given the 6GB/sec result, but iflag=direct still shows a standard single-disk SATA read speed, so, looks like it's not fundamental to the hardware! Does this mean it's ZFS being weird? (oh hang on, running it on all three...) Damn, this is taking ages on `sdi` I think we may have found the problem. – Sam Martin Jul 02 '17 at 14:25
  • Ah, apologies, I mishiglighted in the screenshot. `sdh` is directly attached, performance seems to be pretty much identical on `sdi`, `sdj`, and `sdk` (i.e. 300k/sec`), see updated dd output in original post @shodanshok (I had to reduce it to 10MB as 512 was taking too long. – Sam Martin Jul 02 '17 at 14:48
  • With 300 KB/s sunstained read throughput, you surely have an hardware problem with your SATA controller and/or multiplier. You should try to use your SATA controller with a single drive (excluding the SATA multiplier) to see if it make a difference. – shodanshok Jul 02 '17 at 14:51
  • 2
    Your first line told a cheap sata controler, I think you already know the obvious.. its a cheap hardware bloating everything else. I dont think you can do magic trick to bypass a cheap hardware. – yagmoth555 Jul 02 '17 at 20:15
  • 1
    To follow up on this @shodanshok, I've removed the port multiplier from the equation and everything performs as expected. Thanks for all your help. – Sam Martin Jul 21 '17 at 17:11

0 Answers0