1

It seems Linux mdadm produces less writes & iops as more disks are added to an array. For example, I have tested the following configurations, with the defaults aside from changing the I/O scheduler to deadline and tuned-adm profile to throughput-performance:

Motherboard has Dual E5 processors, DDR4 RAM & 10 X SATA3 ports. SSD's are 10 X Samsung 850 Pro drives. OS is CentOS 7 64. (CentOS 6.7 was really bad). FS is xfs.

With roughly 4-6 drives, sequential writes bypassing cache are roughly 800 MB/s to 1 GB/s. Writing with cache is roughly 2-3GB/s.

Running various fio tests, iops seem to top out at about 80,000 iops with direct flag and of course 800,000+ without the direct flag.

Chunk size is 512k, the default. Partitions seem to be aligned properly.

When more disks are added to the array, the iops stay the same across the board, at roughly 60-80,000 iops and do not scale up linearly with the additional drives.

Additionally, when more drives are added, sequential writes seem to nosedive as if it were just a single drive. Testing a single drive for both iops & sequential writes yields about 70,000 iops (based on RW percentage) and 400-500 MB/s. Sequential are slightly lower with all 10 drives in the array, between 300-500 MB/s.

The sequential writes are not a deal-breaker however, I am wondering if there is a bottleneck or limitation within mdadm that is being overlooked. With 4-6 drives, it performs awesome. Beyond 6 drives, the performance seems to stay the same or drop off, especially with any sequential writes.

EDIT after some additional testing, I'm able to get the sequential speeds up when doing very large writes, such as 20GB, 40GB 80GB etc. A dd test with 42GB yielded 640 MB/s with fdatasync.

I also understand dd is not ideal for benchmarking SSD's - that's not my question, I am trying to understand where the drop off is coming from when going beyond the 4-6 disks.

Bill
  • 11
  • 2
  • 1
    You are probably hitting the hardware limits of the onboard SATA controller. – Michael Hampton May 14 '16 at 03:56
  • Thanks for the comment. This is an interesting point, the chipset is a C612 chipset. I will do some more research on this as well as some further testing with an hba. I have also gotten slightly higher sequentials with a larger file size. I've edited my post to reflect that. – Bill May 14 '16 at 04:18
  • I too think that the onboard controller is being a bottleneck. You could try using a LSI 9207-8i HBA card. [These benchmarks](http://www.thessdreview.com/our-reviews/sata-3/lsi-sas-9207-8i-pcie-3-0-host-bus-adapter-quick-preview/) show that it is able to fully utilize all the SSDs. The only problem is, LSI HBAs do not pass the TRIM command for Samsung SSDs. Also try setting the disk scheduler to "noop". – A.Jesin May 29 '16 at 17:00
  • Thanks for the comment. Yes, the only thing I had not used was an HBA - honestly, the whole point of mdadm was to remove the raid card/hba from the equation. I have also tried different schedulers as well, the performance difference was negligible. I will try out an HBA and see how it performs. – Bill Jun 01 '16 at 03:54

0 Answers0