3

this is silly, this has happened before and I figured out how to fix it and it was fine.

I'm running 4 500GB SATA drives in a RAID-5 on Ubuntu 7.10 server. One of the disks failed (actually I think it's one of the connectors in the hot-swap cage) and it's been running off of three disks while I find a replacement HDD or further diagnose the problem.

Now, before you read any further, NO I do not have backups and the information is not super important, just nice to have.

Anyway once before, I had some kind of HW hiccup, maybe the power went out or something, and I had problems recovering the array. It wasn't that one of the disks failed, it was something else.

I was able to simply add back in the second "failed" disk and in a few minutes, I was back up and running. Maybe I had to run some kind of filesystem check, I don't know.

I spent hours, if not days, figuring out how to do it last time and have since forgotten.

The crux of the issue is that if I run a mdadm --examine on sdb, sdc, and sdd, sdd thinks it's still part of the array but on the superblock info of sdb and sdc, it lists sdd as removed.

sda is the disk that failed long before, it's listed correctly in all of them as faulty removed.

TIA. The server in question is not on the internet so it's not possible to C&P the output of various commands on to the forum.

I know, by now a lot of you probably think I'm a nitwit, or worse. However I do recollect that once I figured out the series of commands to run, it was a fairly straightforward procedure and it worked great.

Inovagent
  • 33
  • 1
  • 4
  • 1
    Even if you do get the RAID up and running again there is no good way to know that your data isn't corrupted, so if there is anything important that you can verify the integrity of you should do so. – Amok Oct 16 '09 at 21:00

6 Answers6

3

Provided the drives have not actually failed but rather become temporarily unavailable or for some other reason have come out of sync, you can try to force the raid online ignoring the change number/time stamp of each member.

By doing this you run the risk of corrupting data, especially if you don't know which drive went offline last - but it sounds like you have little choice.

Read up on the various ways to use the --force option in the mdadm man page.

If one of the drives have actually failed and another is out of sync, you can still bring the raid online supplying "missing" as the device ID for the failed drive, combined with the --force option. This should start the raid as degraded.

Roy
  • 4,376
  • 4
  • 36
  • 53
  • mdadm should have been sending you e-mails about the failures. You should forcibly reassemble the drive exactly as it was in the e-mail before the second failure -- order is important! I've had to do that, and it worked just fine. – divegeek Oct 17 '09 at 14:51
1

Is RAID5 supposed to recover from a two-disk failure? I thought it was not supposed to. What you are looking for is probably the commands to hot-remove and hot-add drives to the raid array.

mdadm --remove /dev/md0 /dev/sdX
mdadm --add /dev/md0 /dev/sdX
sybreon
  • 7,405
  • 1
  • 21
  • 20
  • 3
    No, it doesn't support two disk failure. – ConcernedOfTunbridgeWells Oct 06 '09 at 23:58
  • 1
    Fortunately with mdadm, if the whole drive (both) hasn't failed, you can force the raid to be assembled and usually recover most if not all of the data. Usually there is only a handful of sectors that are not readable and one can even rebuild the raid so that if the broken sectors are in the beginning of drive A and at the end of drive B, first assemble the raid with the working drives + B drive and after half point, assemble the raid with working drives + A drive. Though always image the drives first if possible. – Raynet Oct 07 '09 at 00:45
  • I see. Did not know that. Thanks. – sybreon Oct 08 '09 at 00:21
1

If al else fails, you could use raidextract: http://www.chiark.greenend.org.uk/~peterb/linux/raidextract/

sendmoreinfo
  • 1,772
  • 13
  • 34
0

You could do:

mdadm --stop /dev/md0
mdadm --assemble --force /dev/md0 /dev/sdX /dev/sdY...

just remember to give the drives at the same order they were originally created and with same stripe sizes etc. Also I suggest imaging the drives first.

Raynet
  • 511
  • 2
  • 4
  • 11
0

you could try mdadm --create /dev/md0 --level=5 --raid-devices=4 missing /dev/sd{b..d}

which I adapted from a LinuxQuestions Thread and an Ubuntu thread

Zypher
  • 37,405
  • 5
  • 53
  • 95
James Cassell
  • 201
  • 1
  • 4
0

Thanks for the help. I tried explicitly stating the members with which to assemble and I would get errors like "missing: device not found"

So I tried just --force ing a start of the array and it worked like a charm. No need to remember what order the devices were in or anything like that.

Inovagent
  • 33
  • 1
  • 4