Remove dead HD in Raid1?

Question

I am on debian, raid1, one of the drives seems dead.

root@rescue ~ # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda2[0]
      486279424 blocks [2/1] [U_]

md0 : active raid1 sda1[0] sdb1[1]
      2104448 blocks [2/2] [UU]

unused devices: <none>
root@rescue ~ #

is it possible to use only the health hard drive? do i need to remove the raid? if so, how? thanks!

score 4 · Accepted Answer · answered Aug 12 '10 at 17:32

It looks like /dev/sdb hasn't entirely died, but might have some intermittent faults or some bad blocks. You can probably fail and add the partition back on to your mirror with the current disk that had the problem.

Here is how:

mdadm --remove /dev/md1 /dev/sdb2

(it might complain /dev/sdb2 isn't attached, that is fine)

mdadm --add /dev/md1 /dev/sdb2

Then do a:

cat /proc/mdstat

and you can watch it rebuild, including an estimate on the time it will take.

See if that works. If not (/dev/sdb2 is really damaged), you need to fail the drive on all mirrors, remove sdb, add an identical size drive, partition the new drive, and add the partitions back to the mirror. If you are not sure which drive is sdb, try this:

dd if=/dev/sdb of=/dev/null count=40000

Assuming you have an LED on the front of your server to indicate disk activity, the one with the glowing green light on steady during the above disk dump will be the drive sdb. (Or you could flip this logic around, and cause sda to glow green to indicate the drive not to remove). It is safe to Control-C the dd command anytime after you've figured out which disk is which. The dd command is merely reading a stream off the disk and ignoring it - it doesn't cause anything to be written there, unless you get if= and of= mixed up.

Hopefully it will just be happy with the re-add..I've seen partitions just fail for no apparent reason and adding them back in fixed it.. — Cube_Zombie, Aug 12 '10 at 20:11
thanks, but the server is dead now, both of the hard drives in raid1 are broken... all data lost, no backups.. as usual on friday 13th - thanks anyway! — MilMike, Aug 15 '10 at 09:31
Bummer. However, I'd think it is very likely something else has failed. If possible, take those drives, don't wipe them, and put them into another similar system. If 64 bit, then stay 64 bit, or likewise if the hardware is 32 bit then stay with that. It will usually boot up on similar hardware. If it works, then it was actually the power supply or motherboard controller which was dead. I've seen this happen many times when it initially looks like hard drives are failing. — labradort, Aug 17 '10 at 13:28

score 2 · Answer 2 · answered Aug 12 '10 at 17:07

Yes it is possible to use only the healthy drive. That is what has already happened. I suspect that the failed partition was sdb2? You might want to run badblocks against the partition/drive that failed if you suspect that it isn't really bad.

I am not sure how you have configured the boot-loader, but if it was setup properly then you should be able to pull out the failed drive and replace it.

If you are not entirely sure which drive is which you can use a command like lshw -class disk would should show you both the logical name of the drive and the serial number. That way you can pull out the correct drive.

Remove dead HD in Raid1?

2 Answers2