Ok, I'll try to keep this quick. The data on these drives isn't mission-critical so there's no backup. Losing the data would be a bit annoying, so if I could get it back that would be neat, but if not that's fine. More than anything this seems like a good time to explore some mdadm
wizardry.
I have a raid array that when it was working looked like:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc1[4] sda1[2] sdd1[5] sdb1[3]
2929731072 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
bitmap: 0/8 pages [0KB], 65536KB chunk
But one of the drives failed (sdc1[4]
). Then during rebuild another drive failed (sdd1[5]
). The Classic. But I was a bit suspicious of this second drive failure. It may have just been a power blip or something. I figured if I could put the array together with the failed sdd1[5]
in read-only I could maybe get some of the data off the array still.
Now it looked like:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc1[4](S) sda1[2] sdd1[5](F) sdb1[3]
2929731072 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2] [_UU_]
bitmap: 8/8 pages [32KB], 65536KB chunk
unused devices: <none>
Ok, so, I wanted to ignore the failure and re-add sdd1[5]
, but the re-add added it as a spare instead... That's not good.
When I examine all the disks I get that they all have the same number of events, but two of them are Active device 1
and Active device 2
, and the other is spare
...
I tried an --assemble --force
but that just put me back into the same state. What I want is some way to tell the drive it's not a spare, but I'm not sure such a tool exists. So I figure the last thing I can try is rebuilding a new array with --create
and --assume-clean
to see if maybe I can squeeze the last bit of data out of this. But this feels destructive if I get it wrong, and I probably only have one shot at this, so I'm looking for someone who knows more than me.
So my first question is (A) Is there any hope that this could possible work, or am I just misguided?
After that comes (B) Is there something else I could try that's a bit less drastic that has a better chance of working?
And finally (C) Assuming this is my best shot... what order do I give the disks to --create
in? In the state they were listed in sdc1[4] sda1[2] sdd1[5] sdb1[3]
order, but the state when they failed was _UU_
, but it was sdc1[4]
and sdd1[5]
that failed... And in the examine a
and b
were listed as Active device 1
and 2
which lines up with the _UU_
thing, assuming it's zero-based, but I don't know why they'd be in that order... So if I was to run a create, how do I know what order to put the disks in and where to put the missing
one? I assume I can only mess that up once, so I'd like to take my best shot if I can.
Thanks for reading!