md-RAID5 - How fast should it be or where is the bottleneck?

Question

I set up a new server with a LUKS encrypted RAID5. On the former server, the bottleneck was definitely the CPU, as it was a 7y old single core and the load went up to 100%.

Now it is different. I still get poor write performance, but I cannot see where the bottleneck is.

During

root@home-le:/data# dd if=/dev/zero of=benchmark bs=100MB count=100
100+0 Datensätze aus
10000000000 Bytes (10 GB) kopiert, 775,726 s, 12,9 MB/s

I get

root@home-le:/data# iostat
Linux 2.6.38-11-server (home-le)        23.09.2011      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,22    3,58   10,02   13,56    0,00   72,61

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              66,63       795,46      8876,84  105325279 1175367302
sdc             244,12      8203,55      1523,39 1086218095  201709949
sdf             253,41      8219,63      1519,15 1088347371  201148053
sde             242,42      8172,09      1495,00 1082051932  197950373
md0             933,49        36,80      3937,60    4872631  521371476
dm-4            933,51        36,79      3938,19    4871328  521449348

The array is in sync

md0 : active raid5 sda1[5] sdc1[0] sde1[2] sdf1[4]
      2768292864 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

and consists of 4 950GB partitions of 4 1TB WD Caviar Green. (The other partions on the discs do not have considerable load.) FS is ext4 with block size 4096.

If you don't know about the bottleneck, I would also appreciate your results from comparable arrays.

Addition: I also already changed a disk (due to the fact that I started with one defect disk and only hat a 2TB disc until the 1TB disc was warranty-replaced). The rebuild (which should not care about encryption?) also runs with only about 16MB/s. For the initial build, I remember 30MB/s. — mcandril, Sep 23 '11 at 10:03
There's something strange with your environment. You have ~5x more reads than writes on sd[cfe]. Which drives constitute your RAID? — Paweł Brodacki, Sep 23 '11 at 10:17
Your chunks are huge, 512KB. That's really going to hurt you. — David Schwartz, Sep 23 '11 at 10:24
David Schwartz -- To be honest, I don't know what that is. I did not specify anything specal regarding that I built the array just with "mdadm --build /dev/md0 --level=5 --raid-devices=5 /dev/sd[a-d]1". Quick google search suggests you are right, though. Could you give a more detailed answer? — mcandril, Sep 23 '11 at 10:30
512KB chunks means that your RAID 5 stripe is 1,536KB -- for RAID 5, stripe=chunk*(drives-1). That means that small writes require reading and then writing massive amounts of data. 32KB or 64KB would have been more reasonable. — David Schwartz, Sep 23 '11 at 13:35

score 1 · Accepted Answer · answered Sep 23 '11 at 10:13

1

RAID-5 tends to have pretty low write performance, but I admit less than 13 MB/s is sub-par.

Can you try benchmarking only a single disk? I've heard horror stories about WD Caviar Green series over the years. Haven't checked out if things have improved, but couple of years ago the debate was about RPMs of Caviar Green drives. Some suspected it was around 5400 RPM and not 7200 RPM, and that made drive very slow. Western Digital, of course, had their own explanation of the situation:

"A fine-tuned balance of spin speed, transfer rate and caching algorithms designed to deliver both significant power savings and solid performance."

Err, right.

So, can you benchmark only a single disk with all the unnecessary layers (LUKS, RAID) removed and see if it's much faster?

answered Sep 23 '11 at 10:13

Janne Pikkarainen

31,852
4
58
81

You might have kind of nailed it. I tested another 2TB (even SATAIII compatible!) WD Caviar Green (no RAID, no LUKS, plain ext3) in the system with bs=100MB count=100 and got 5,9 MB/s write performance. That that could well be root of the problem, but is to crappy to just explain it with 5400RPM ... – mcandril Sep 23 '11 at 10:24
Apparently WD Caviar Green has more problems under the hood, then ... if you have some other system (and preferably other OS) available, try the drive in that. If even there performance sucks, kill the drives with fire and buy something more performant. – Janne Pikkarainen Sep 23 '11 at 10:27

score 0 · Answer 2 · answered Sep 23 '11 at 13:59

May I draw your attention to BAARF which discusses parity-raid performance issues at length?

Aside from that, you really should have a write cache enabled with any kind of striped-with-parity datasets - otherwise any random load (no matter how small) will kill your performance immediately.

hdparm -W 1 /dev/sd[acfe]

BTW: hard drives are slow on random access. I mean REALLY SLOW. Most of the time nobody notices as things are likely to be cached or likely to be sequential in a "one computer - one hard drive" scenario. But all I/O intensive server loads suffer from this problem. I can have a RAID10 array of eight 15K Cheetah disks (arguably the fastest midrange disks available) saturated at 100% utulization with full queues at less than 10 MB/second if the load is spread "right" (i.e. small block sizes, random write/read-access) in a virtualization scenario. If you need random access, make sure you have big (write-back) caches. If you need a lot of random access, get tiered storage.

md-RAID5 - How fast should it be or where is the bottleneck?

2 Answers2