Uneven disc throughput in RAID1 configuration

Question

I am having performance issue caused by slow IO in server which has two identical SSDs in RAID1 configuration. By using "atop" noticed that discs usage is no even:

DD |           md2 |  busy      0%  | read       0  |  write   1717 |  MBr/s   0.00 |  MBw/s   0.58  | avio 0.00 ms  |
DSK |           sdb |  busy     99%  | read       0  |  write    842 |  MBr/s   0.00 |  MBw/s   0.58  | avio 11.8 ms  |
DSK |           sda |  busy     11%  | read       0  |  write   1058 |  MBr/s   0.00 |  MBw/s   0.58  | avio 1.01 ms

Question is what could be cause of it? Why sdb usage is much higher? I already noticed same issue on few servers so it is very unlikely that all of them have faulty sdb. Also, checked discs info by using hdparm to make sure they are identical. Also, it only happens on servers running production MySQL server. I was trying to reproduce this issue by simply writing and reading from partition, but I was not able to reproduce same results this way. Thank you for suggestions.

[root@CentOS-67-64-minimal ~]# cat /proc/mdstat 
Personalities : [raid1] 
md2 : active raid1 sdb3[1] sda3[0]
  232753344 blocks super 1.0 [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
  524224 blocks super 1.0 [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
  16777088 blocks super 1.0 [2/2] [UU]

unused devices: <none>

[root@CentOS-67-64-minimal ~]# hdparm -tT /dev/sda

/dev/sda:
Timing cached reads:   28484 MB in  2.00 seconds = 14263.62 MB/sec
Timing buffered disk reads: 1096 MB in  3.00 seconds = 365.15 MB/sec
[root@CentOS-67-64-minimal ~]# hdparm -tT /dev/sdb

/dev/sdb:
Timing cached reads:   21656 MB in  2.00 seconds = 10841.67 MB/sec
Timing buffered disk reads:  14 MB in  3.95 seconds =   3.54 MB/sec

[root@CentOS-67-64-minimal ~]# iostat -x 1
Linux 2.6.32-573.3.1.el6.x86_64 (CentOS-67-64-minimal)  2015.11.20      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
       8,96    0,13    4,22    3,93    0,00   82,75

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0,06   383,28    0,22  164,10    20,26  4275,72    26,15     2,26   13,76   3,21  52,79
sda               2,01   329,12    1,50  218,25   168,39  4275,72    20,22     0,17    0,77   0,35   7,71
md0               0,00     0,00    0,00    0,00     0,01     0,00     8,00     0,00    0,00   0,00   0,00
md1               0,00     0,00    0,00    0,00     0,01     0,00     7,62     0,00    0,00   0,00   0,00
md2               0,00     0,00    1,75  546,03   172,42  4274,38     8,12     0,00    0,00   0,00   0,00

MadHatter edits:

Here's my iostat output under very light load; note the %util on sdb's spindle (you can distinguish my output from Nerijus' by the different hostname in the prompt, and I'll keep my edits below the (above) line):

[me@lory ~]$ iostat -x 1
[...]
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.25    0.00    0.50    0.00    0.00   99.25

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00    4.00     0.00    20.00     5.00     1.00  272.50 250.00 100.00
sda               0.00     0.00    0.00    4.00     0.00    20.00     5.00     0.07   17.75  17.75   7.10
md1               0.00     0.00    0.00    5.00     0.00    40.00     8.00     0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    4.00     0.00    32.00     8.00     2.51  272.50 250.00 100.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

And here's my hdparm output:

[me@lory ~]$ sudo hdparm -tT /dev/sda
One-time password (OATH) for `me': 

/dev/sda:
 Timing cached reads:   1730 MB in  2.00 seconds = 864.60 MB/sec
 Timing buffered disk reads: 436 MB in  3.00 seconds = 145.12 MB/sec
[me@lory ~]$ sudo hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   1580 MB in  2.00 seconds = 789.63 MB/sec
 Timing buffered disk reads:  14 MB in  8.43 seconds =   1.66 MB/sec

I can also confirm that my /proc/mdstat indicates no resyncing, and that stopping mysqld doesn't make the problem go away.

updated main post with this output. There is nothing wrong with it: no failed discs; no resync in progress. — Nerijus, Nov 20 '15 at 10:43
It's interesting you ask this; I'm having exactly the same problem with my C6.7 box this morning. Now I'm wondering if there's been some odd kernel upgrade that's karked the RAID-1 logic. — MadHatter, Nov 20 '15 at 10:50
Yes your raid it's ok , then check your HDD status with hdparm -tT /dev/sda and hdparm -tT /dev/sdb — Francesco P, Nov 20 '15 at 10:54
@NerijusSpl I used different tools to reach the same conclusion, would you like me to edit my output into your question so that tools issues can be ruled out? — MadHatter, Nov 20 '15 at 11:05
@MadHatter yes, please. Just make it clear that it is from different server. — Nerijus, Nov 20 '15 at 11:19
Ok, so both readings are very similar. My guess it is something with await and svctime. But still does not make the issue clear. await - The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. svctime - The average service time (in milliseconds) for I/O requests that were issued to the device. — Nerijus, Nov 20 '15 at 11:35
My feeling is the high await and svctimes are symptomatic of the throttled throughput to the sdb spindle, not a cause of it. I'm very curious about the hdparm output; broadly similar cached reads, but **wildly** different buffered disc reads. I'm hoping someone who understands hdparm a bit better can shed more light. — MadHatter, Nov 20 '15 at 11:39

score 1 · Answer 1 · answered Nov 20 '15 at 16:29

Possible reasons i see:

Because of disk (+ maybe controller) caches - they're never working in same way
Different firmware
If you don't use TRIM on all levels, then SSD speed will go lower, because from the view of the controller (inside the disk), disk will be full, then SSD disks are slower. Benchmarks show, usually, disk speed depends on its usage too.
Related to previous - allocation place matters, is it SSD or not, some/most SSD's have RAID-like structure, therefore maximal speed depends on where do you write data
Different HW from disk to bus (including cable, link speed, controller etc)
OS block cache for the disk - maybe memory is not enough to balance cache for both disks...

hdparm -tT not gonna give you real results anyway, it's buffered and not always as transparent as we expect, it doesn't necessary expose how disks work internally, especially in 2 secs.

Probably, they're fine.

I wrote down the same "Using hdparam -tT we ask for 2 kinds of test , the first one by using the cache the second one is without any prior caching of data, so directly on the disk, then probably it's a mechanical problem degrade (that's my suspect) or the disks are different or the disks have different firmware on board" — Francesco P, Nov 20 '15 at 16:40
I recommend to enable TRIM, `fstrim` entire disk, check firmware etc and only after that test it with hdparm, but, again, it's not necessary to always trust `hdparm`. To be sure, that results are real - use `iozone` or similar software. — GioMac, Nov 20 '15 at 16:42

score -2 · Answer 2 · answered Nov 20 '15 at 13:17

-2

We have figured out that your SBD it's not in health, this is my answer. Take a look , you need to obtain a value like this (tested on 3 different server in production with high I/O)

1-Timing buffered disk reads: 560 MB in  3.00 seconds = 186.43 MB/sec
2-Timing buffered disk reads: 276 MB in  3.09 seconds =  89.23 MB/sec
3-Timing buffered disk reads: 326 MB in  3.00 seconds = 108.66 MB/sec
5-Timing buffered disk reads: 528 MB in  3.00 seconds = 175.97 MB/sec
6-Timing buffered disk reads: 528 MB in  3.00 seconds = 175.94 MB/sec

So your 1.66 MB it's so far.. then produce overload on your system. I hope it helps!

answered Nov 20 '15 at 13:17

Francesco P

321
1
7

So where do you think the problem is? If it's hardware, why have a whole bunch of the OP's servers *all* manifested the problem? If it's software, why do you say that "*SBD is not in health"*? Also, what OS/distro/version and kernel version did your test data come from? – MadHatter Nov 20 '15 at 13:21
Using hdparam -tT we ask for 2 kinds of test , the first one by using the cache the second one is without any prior caching of data, so directly on the disk, then probably it's a mechanical problem degrade (that's my suspect) or the disks are different or the disks have different firmware on board – Francesco P Nov 20 '15 at 13:46
Yes, I know that. Just because you're removing the OS cache from the equation doesn't mean you're not still going through the kernel. I agree that you have identified a *suspect*, but we had that already. I'm looking for either more tests, or a more perceptive analysis of the data we already have, to let me distinguish between the two main suspects. A judgement call is all very well, but I can make one of those myself. It's Nerijus' question, though; (s)he may be satisfied with your answer. – MadHatter Nov 20 '15 at 13:49
could you post and check your firmare of your disks hdparm -I /dev/sda ? – Francesco P Nov 20 '15 at 13:54
Done; note also that `smartlctl -t long /dev/sdb` is completing at the normal rate, suggesting that the HDD hardware is perfectly capable of supporting normal-speed reads. – MadHatter Nov 20 '15 at 14:11
MadHatter how can I compare if you post just one disk pelase make hdparm -I /dev/sda also – Francesco P Nov 20 '15 at 14:29
Hey, I gave you what you asked for ! If you want to compare outputs, I think you should post a comment to the main question, and wait for Nerijus; my HDDs aren't the same model/size/vendor, so I'm not sure how comparable my `hdparm` outputs are. – MadHatter Nov 20 '15 at 14:34
Your disks are different????? Please don't care....I'm wasting my time – Francesco P Nov 20 '15 at 14:38
The post with hdparm -I /dev/sdb it's disappear...it's a mystery... – Francesco P Nov 20 '15 at 14:48
Not at all. I edited it out once you made it clear that it was useless to you for diagnostic purposes. The question's long enough already, without having surplus information in it. You can still see the information in the post's history; SF is a wiki, after all. – MadHatter Nov 20 '15 at 15:01
I don't think so ... – Francesco P Nov 20 '15 at 15:07
On my servers hdparm -I /dev/sd* outputs are identical on both discs except for serials and uniqueIDs. – Nerijus Nov 23 '15 at 06:33

Uneven disc throughput in RAID1 configuration

2 Answers2