mdadm raid6 recovery reads more from one drive?

Question

I just replaced a fault drive in my raid6-array (consisting of 8 drives) during the recovery I did notice something strange in iostat. All drives are getting the same speeds (as expected) except for one drive (sdi) which is constantly reading faster than the rest.

It is also reading about one eighth faster which might have something to do with that there are in total eight drives in the array, but I don't know why...

This was true during the whole recovery (always the same drive reading faster than all of the rest) and looking at the total statistics for the recovery all drives have read/written pretty much the same amount except for sdi which have read one eighth more.

Some iostat stats averaged for 100s:

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb             444.80        26.15         0.00       2615          0
sdb1            444.80        26.15         0.00       2615          0
sdc             445.07        26.15         0.00       2615          0
sdc1            445.07        26.15         0.00       2615          0
sdd             443.21        26.15         0.00       2615          0
sdd1            443.21        26.15         0.00       2615          0
sde             444.01        26.15         0.00       2615          0
sde1            444.01        26.15         0.00       2615          0
sdf             448.79        26.15         0.00       2615          0
sdf1            448.79        26.15         0.00       2615          0
sdg             521.66         0.00        26.15          0       2615
sdg1            521.66         0.00        26.15          0       2615
sdh             443.32        26.15         0.00       2615          0
sdh1            443.32        26.15         0.00       2615          0
sdi             369.23        29.43         0.00       2942          0
sdi1            369.23        29.43         0.00       2942          0

Can anyone offer a sensible explanation? When I discovered that it was ~exactly one eighth faster I figured that it had to do with the parity but that really didn't make much sense (I don't know about the specific raid 6 implementation in mdadm but for one it surely can't store all of the parity on one drive...).

UPDATE: Well, I did replace another drive just now (same array) and I am seeing the exact same results but this time with a different drive reading faster (actually, it is the drive I added for the last recovery that has decided it want to do more work).

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb             388.48        24.91         0.00       2490          0
sdb1            388.48        24.91         0.00       2490          0
sdc             388.13        24.91         0.00       2491          0
sdc1            388.13        24.91         0.00       2491          0
sdd             388.32        24.91         0.00       2491          0
sdd1            388.32        24.91         0.00       2491          0
sde             388.81        24.91         0.00       2491          0
sde1            388.81        24.91         0.00       2491          0
sdf             501.07         0.00        24.89          0       2489
sdf1            501.07         0.00        24.89          0       2489
sdg             356.86        28.03         0.00       2802          0
sdg1            356.86        28.03         0.00       2802          0
sdh             387.52        24.91         0.00       2491          0
sdh1            387.52        24.91         0.00       2491          0
sdi             388.79        24.92         0.00       2491          0
sdi1            388.79        24.92         0.00       2491          0

Theese are 4k-drives (but as all drives do (or at least did) they stil report 512-byte sectors). So I figured I might have aligned the partitions wrong somehow (what implications that might have had I don't know, depends on how mdadm works and stripe size I guess, anyway easy enough to check):

debbie:~# fdisk -l -u /dev/sd[bcdefghi] | grep ^/dev/sd
/dev/sdb1            2048  3906988207  1953493080   fd  Linux raid autodetect
/dev/sdc1            2048  3906988207  1953493080   fd  Linux raid autodetect
/dev/sdd1            2048  3906988207  1953493080   fd  Linux raid autodetect
/dev/sde1            2048  3906988207  1953493080   fd  Linux raid autodetect
/dev/sdf1            2048  3907024064  1953511008+  fd  Linux raid autodetect
/dev/sdg1            2048  3907024064  1953511008+  fd  Linux raid autodetect
/dev/sdh1            2048  3906988207  1953493080   fd  Linux raid autodetect
/dev/sdi1            2048  3906988207  1953493080   fd  Linux raid autodetect

f and g are the new drives and appear to be slightly bigger but they all start on the same sector (all drives are of the same make an model (and on the same controller) but the new ones are bought ~6 months later than the rest).

score 1 · Answer 1 · answered Aug 05 '11 at 22:10

1

Curious. RAID6 certainly doesn't store anything all on one drive, so in general there is no reason for one drive to be read more than others. I note there is a corresponding decrease in the TPS on that one drive; my guess would be that sdi is a different model or technology of drive, or is connected to a different model of controller, and hence can be read differently (larger reads per transaction).

answered Aug 05 '11 at 22:10

womble

96,255
29
175
230

Thanks, yeah it doesn't seem to make sense. I figured that the "one eighth" part was important but I still can't get my head around it. All drives are of the same make an model and connected to the same controller. The post is updated for more, uhm, weirdness. About the tps that doesn't seem to be as consistent, the overall speed of the recovery process temporarily slowed down a bit for some reason earlier today and during that time the 'fast drive' was seeing more tps than the rest (it's always one eighth faster though). – Peter Aug 06 '11 at 06:37

mdadm raid6 recovery reads more from one drive?

1 Answers1