1

EDIT: Solutions to my issue are below

I want to create a well optimized RAID6 array with mdadm with an XFS partition on a CentOS system. It'll involve ten ST3000DM001 Seagate 3TB drives and I'll be writing a few terabytes at a time, all in files of over 100 MB, over a gigabit ethernet connection. The more I read about configuring RAIDs (making things align with each other etc) the more I feel like I don't know what I'm doing.

Specifically I'd love some help with my mdadm --create and mkfs.xfs commands, and advice on if I need to change anything else in the hardware or OS.

From looking at online resources my best guess is

RAID6 Chunk size: 2MiB (this is just a guess)

XFS Block size: 4KiB (I'm assuming that's the pagesize for my system and I gather I can't go higher)

XFS agcount: 64 (this is just a guess)

sunit=4096,swidth=32768

So it looks like my commands should be:

# mdadm --create /dev/md0 --chunk=2048 --level=6 --raid-devices=10 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdh /dev/sdi /dev/sdj /dev/sdk
# parted /dev/md0
    mklabel gpt
    mkpart primary 0.00TB 27.00TB
# mkfs.xfs -L /big_raid -b size=4096 -d agcount=64,sunit=4096,swidth=32768 /dev/md0p1

But I would just love any advice. I made a RAID5 array with similar settings and it was pretty slow, writing at 20 MB/sec (which was 1/5th the speed I could get with a single drive)

EDIT:

So I found out that the Seagate ST3000DM001 may have been the worst drive I could possibly have chosen according to Backblaze so first thing: don't buy them! I've had I think 4 out of 10 fail in a year and a half

I built the RAID6 with the following changes and it worked at 5 times the speed, but I'm not sure how much each individual change made a difference

  • Create aligned partitions on each drive and build the RAID from them, as mentioned by the answer below

From this page

  • Disable NCQ on each drive. For each drive

    # echo 1 > /sys/block/sd[driveletterhere]/device/queue_depth

  • Set read-ahead

    # blockdev --setra 65536 /dev/md[raidnumberhere]

  • Set stripe-cache_size for RAID

    # echo 16384 > /sys/block/md[raidnumberhere]/md/stripe_cache_size

Hope that helps anyone who finds this looking for the same thing I was! For the record my CPU is AMD A4-5300 and I have 8GB of RAM and this setup barely touches either one

  • RAID6 is going to be slow for writes, because of the OS has to calculate and store two parity bits for every bit written. If you want throughput, go with RAID10 instead. – Craig Watson Nov 10 '15 at 19:22
  • That's actually an interesting idea. We had originally done RAID5 due to needs for space, but a fast 13.5 TiB volume might actually be better than a slow 21.8 TiB volume now. Thanks for the idea! I'll have to ponder this now. – electron.rotoscope Nov 10 '15 at 19:51
  • Backblaze info here: https://www.backblaze.com/blog/hard-drive-reliability-stats-for-q2-2015/ and XFS creation info here: https://www.mythtv.org/wiki/Optimizing_Performance#Optimizing_XFS_on_RAID_Arrays – electron.rotoscope Nov 17 '15 at 22:07

1 Answers1

0

I agree. Except you don't have to run parted on md0, just create a partition.

# mdadm --create /dev/md0 --chunk=2048 --level=6 --raid-devices=10 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdh /dev/sdi /dev/sdj /dev/sdk
# mkfs.xfs -L /big_raid -b size=4096 -d agcount=64,sunit=4096,swidth=32768 /dev/md0

From my experience, changing block size doesn't improve the performance much. It is always as litle as 1-2%, no big deal.

Creating raid on drives directly can make some problems on some systems, even worse if you want to use that as a boot drive. Therefore I would use parted on drives, to create 1 gpt partition on each. The partision would be nice to be aligned - which parted supports (please follow https://www.google.cz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=parted%20alignment%20partitions) and than create md0 on /dev/sda1, /dev/sdb1, ...

Yarik Dot
  • 1,583
  • 12
  • 26