3

We are setting up an ADAPT0 (RAID-60-like) configuration for a file server.

We have six disk pools. Each consists of 14 disks and is set up using ADAPT. According to Dell's official white paper, ADAPT is similar to RAID 6 but distributes spare capacity. On page 13, it is indicated that the chunk size is 512 KiB and that the stripe width is 4 MiB (over 8 disks) for each disk pool.

My understanding is that for each 14 disk pool, 2 disks worth of capacity is reserved for spare, 20% of the remaining 12 disks (2.4 disks worth of capacity) is used for parity and 80% (9.6 disks) is used for storage. However, the chunk size is 512 KiB and the stripe width remains 4MiB since we are only writing to 8 disks in one contiguous block.

To achieve an ADAPT0 (RAID-60-like) configuration, we then created a logical volume that stripes over two disk pools using LVM. Our intent is to eventually have 3 striped volumes, each striping over two disk pools. We used a stripe size that matches that of the hardware RAID (512 KiB):

$ vgcreate vg-gw /dev/sda /dev/sdb
$ lvcreate -y --type striped -L 10T -i 2 -I 512k -n vol vg-gw

Next, set up an XFS file system over the striped logical volume. Following guidelines from XFS.org and a few other sources, we matched the stripe unit su to the LVM and RAID stripe size (512k) and set the stripe width sw to 16 since we have 16 "data disks".

$ mkfs.xfs -f -d su=512k,sw=16 -l su=256k /dev/mapper/vg--gw-vol
$ mkdir -p /vol/vol
$ mount -o rw -t xfs /dev/mapper/vg--gw-vol /vol/vol

We benchmarked sequential I/O performance of 4KiB block sizes on /dev/sda and /dev/sdb and /dev/mapped/vg--gw-vol using

fio --name=test --ioengine=posixaio --rw=rw --bs=4k --numjobs=1 --size=256g --iodepth=1 --runtime=300 --time_based --end_fsync=1

We were surprised to obtain similar performances:

       Volumes         Throughput   Latency
---------------------  ----------  ----------
/dev/sda                198MiB/s    9.50 usec
/dev/sdb                188MiB/s   10.11 usec
/dev/mapped/vg--gw-vol  209MiB/s    9.06 usec

If we use the I/O monitoring tool bwm-ng, we can see I/O to both /dev/sda and /dev/sdb when writing to /dev/mapped/vg--gw-vol.

Did we configure properly? More specifically:

(1) Was it correct to align the LVM stripe size to that of the hardware RAID (512 KiB)?

(2) Was it correct to align the XFS stripe unit and widths as we have (512 KiB stripe size and 16 data disks), or are we supposed to "abstract" the underlying volumes (4 MiB stripe size and 2 data disks)?

(3) Adding to the confusion is the self-reported output of the block devices here:

$ grep "" /sys/block/sda/queue/*_size
/sys/block/sda/queue/hw_sector_size:512
/sys/block/sda/queue/logical_block_size:512
/sys/block/sda/queue/max_segment_size:65536
/sys/block/sda/queue/minimum_io_size:4096
/sys/block/sda/queue/optimal_io_size:1048576
/sys/block/sda/queue/physical_block_size:4096

Thank you!

Nicolas De Jay
  • 209
  • 2
  • 11
  • 2
    I just read through that whitepaper. Very interesting. I wonder why you chose to make two disk pools and try to stripe them? It seems that one would be perfectly fine, and just as performant if not more so, and less complex to set up. – Michael Hampton Jul 15 '20 at 01:15
  • I omitted this from the post, but the storage (Dell EMC ME4084) is equipped with two hardware RAID controllers. Each of the 14-disk disk pools is associated with a dedicated hardware RAID controller. We were hoping to fully leverage the two controllers in this manner. What do you think? – Nicolas De Jay Jul 15 '20 at 13:04
  • 1
    OK, that makes sense. I expect it might not make any difference though, with as few disks as you have compared to the capacity of the storage array. You're probably going to end up doing a lot of benchmarks. – Michael Hampton Jul 15 '20 at 13:16
  • Thanks for your feedback. We have a total of 6 such 14-disk disk pools and intend to set up a total of 3 striped volumes. The file server is for an HPC cluster, and each striped volume will be dedicated to a different research group. This is where we thought that leveraging both controllers would be beneficial to overall performance. – Nicolas De Jay Jul 15 '20 at 14:01
  • I am concerned that I am unable to see an increase in performance by striping (I edited the OP) using a sequential I/O test. It is unclear to me whether the test is bad or the stripe was not set up optimally. – Nicolas De Jay Jul 15 '20 at 14:02
  • 1
    Hm. Since the storage array itself is massively striping everything that you write across all the disks, I don't know if any stripe width you set in LVM or XFS is going to make much difference, as long as it's the ADAPT array's chunk size (512KiB), or a multiple of it. – Michael Hampton Jul 15 '20 at 14:06

1 Answers1

1

I would avoid inserting a RAID0 layer on top of ADAPT. Rather, I would create a simple linear LVM pool comprising the two arrays or, alternatively, create a single 28 disks array (not utilizing the second controller at all).

If a linear LVM concatenation of the two arrays, XFS will give you added performance by the virtue of its own allocation group strategy (due to the filesystem concurrently issuing multiple IOs to various LBA ranges).

However, a single 28 disks pool should provide slightly better space efficiency due to less total spare capacity vs user data.

Regarding XFS options, you should use su=512k,sw=8 based on ADAPT layout. Anyway, with high end controller equipped with large powerloss-protected write cache, this should have a minor effect.

shodanshok
  • 47,711
  • 7
  • 111
  • 180
  • Thanks for the feedback. Can you elaborate further on why you would do away with LVM striping on top of ADAPT? Is it only in this particular situation, where XFS can effectively load balance I/O through a linear logical volume, such that even RAID6 would be equivalent to RAID60? – Nicolas De Jay Jul 16 '20 at 13:21
  • 1
    Because the golden rule in storage is to avoid unnecessary complexity and/or layering, especially between different RAID levels. Regarding XFS, using it on top of two RAID6 volumes is *not* the same as using a single RAID60; rather, it provide increased performance by higher concurrency in file access (as directories are assigned to different allocation groups). Anyway, remember to bench/test with a representative workload: this is the only method to be (more or less) sure then a specific storage layout is the better one. – shodanshok Jul 16 '20 at 14:20
  • Thanks for the insights! Another motivation for striping was to fill up both disk pools evenly to increase performance as read/write would hit more disks at one time, but maybe we should try an ADAPT volume over 28 disk array and compare their performances as you suggested. – Nicolas De Jay Jul 17 '20 at 00:33