0

I have a degraded array with 8 disks.

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-5    REBUILDING     26%     -       64K     1629.74   ON     OFF

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     232.88 GB   488397168     VDB41BT4DM3Z6C
p1     OK               u0     232.88 GB   488397168     VDB41BT4CMARDC
p2     DEGRADED         u0     232.88 GB   488397168     VDB41DT4EGWREC
p3     OK               u0     232.88 GB   488397168     VDB41BT4CHU1RC
p4     OK               u0     232.88 GB   488397168     VFA100R1CGR0LB
p5     DEVICE-ERROR     u0     232.88 GB   488397168     VDB41BT4CMJ5MC
p6     OK               u0     232.88 GB   488397168     VDB41BT4CMARYC
p7     OK               u0     232.88 GB   488397168     VDB41BT4CMJJHC

I replaced the failed disk at p2, and started to rebuild with no problems, but around 16% into the rebuild, the disk at p5 throws a DEVICE-ERROR which pauses the rebuild process.

When I rescan (tw_cli /c3 rescan), the DEVICE-ERROR disappears and the rebuild starts again. Around 26%, this DEVICE-ERROR appears again and this time breaks the rebuild process, which starts over from 0%.

This has been happening for a week now, and I can't rebuild the array. Is there any way to ignore this DEVICE-ERROR just until the array is rebuilt?

HopelessN00b
  • 53,795
  • 33
  • 135
  • 209
  • RAID is not backup, it just reduces downtime in the event of a single device failure. If you have any failure other than a failure of a single disk, you will have to recover from a backup. – David Schwartz Feb 10 '14 at 14:09

1 Answers1

3

Yeah, you're doing it wrong. You replace the failed disk, then you rebuild the array. Of course it's not working now. You're trying to rebuild data onto a bad disk. That's not gonna work.

I would also suggest that RAID5 (in this day and age), with 8 disks is a bad idea.

Use RAID6, or at the very least, have a hot spare. The disks aren't large, so you might be able to get away with the setup you have now, but you've also introduced a non-trivial chance that the rebuild process will cause another disk to fail (and destroy the array).


Based on your updated information, you're probably out of luck with regards to repairing this array.

Before admitting defeat, however, it might be a good idea to scan the disk atp5 for bad blocks or disk sectors, just in case the DEVICE ERROR is as simple as that. If it is, you repair the error, continue the rebuild, and then replace disk p5 and rebuild again.

Assuming that's not enough, the best approach at this point is to copy off the data from the array (or restore from backups). Some of that data will be corrupted/lost if you don't have backups - at a minimum, the data you're getting a DEVICE-ERROR from P5 when trying to access it, so you may have to manually exclude those file(s) or directory(s) from the copy process. (It can be a whole lot worse than that, of course, but either way, just do as best as you can).

Once the data's safe, or you've gotten as much of it off as you'll be able to, recreate the array in a better format before copying the data back. I personally wouldn't use anything other than RAID 1/10 or 6/60 these days, but that's ultimately up to you, but hopefully this has taught you a lesson about with RAID5 isn't a good idea.

HopelessN00b
  • 53,795
  • 33
  • 135
  • 209
  • If i replace the p5 disk which throws DEVICE-ERROR, I will lose my data since this is raid5, or I am wrong? Maybe i didn't explain it well. p2 was broken and i did replace it. But now it can't rebuild because p5 starts to throw DEVICE-ERROR. If i replace p5 too, i will lose the data. – Alex Angelov Feb 10 '14 at 11:32
  • As i mention p2 need to rebuild. All other disks are not OK. The p5 disk throws DEVICE-ERROR and stops the rebuild process. I'll edit the paste from the console to see what happens. – Alex Angelov Feb 10 '14 at 11:36
  • @AlexAngelov It sounds like you're probably out of luck. I would try to copy the data off somewhere (and I'm betting some of it will be corrupted /lost - the data you're getting a DEVICE-ERROR from P5 on), and then recreate the array in a more resilient format. – HopelessN00b Feb 10 '14 at 11:41
  • The problem is, that this controller is in LVM with two more controllers. – Alex Angelov Feb 10 '14 at 11:45
  • And the data seems to be ok by the way. – Alex Angelov Feb 10 '14 at 11:49