mdadm raid5 was added as a spare to itself

Question

I have a raid 5 device created with mdadm. I was working on it and long story short, I wanted to type mdadm --manage /dev/md127 -a /dev/sdd, but I accidentally typed mdadm --manage /dev/md127 -a /dev/md127 -a /dev/sdd which added the raid as a spare drive to itself.

    Number   Major   Minor   RaidDevice State
       4       8       32        0      active sync   /dev/sdc
       1       8       16        1      active sync   /dev/sdb
       -       0        0        2      removed

       3       9      127        -      spare   /dev/md127

Now the raid isn't working properly. I cannot seem to kill it, and my attempts to decrypt it have just ended with the computer hanging with ^C doing nothing.

So my question is this: Is there a way to fix this? I have tried moving the raid to another computer, and restarting the computer and building the raid with mdadm --assemble /dev/md0 /dev/sd[b-c]. Neither of these worked.

@MichaelHampton A raid config backup? Or a data backup? Im guessing its the second one, but I just want to confirm — qspitzer, Sep 11 '19 at 23:28
If you actually have a backup of the RAID metadata, then maybe you can just restore that and not have lost anything. Maybe. But you should be prepared to restore all your data from backup. — Michael Hampton, Sep 12 '19 at 02:00
@MichaelHampton I have done `mdadm --examine` on each drive and saved the output. Will that be enough? I was also thinking about using `--build` without including the `md127` drive. Could that work, or is it more likely to ruin my data? — qspitzer, Sep 12 '19 at 02:19
I'm not sure. You're in the realm of black magic there. If it were my RAID array I would have already started restoring from backup. — Michael Hampton, Sep 12 '19 at 02:22
I'm with @MichaelHampton here. Though I can't say for sure, I think it's likely that when you added the array itself as a spare, and the array was already down a drive, it would have immediately have incorporated it into the array and started reconstruction. That'd overwrite the contents of... well, the array that you added as a spare. It's terrifyingly easy to screw up with `mdadm` - so many of us have! — Mike Andrews, Sep 12 '19 at 03:16
@MikeAndrews This seems like this is a problem that should be fixed in the program. Also, the drive never seems to be used as a main drive. It always stays as a spare, so would it still be overwriting itself? — qspitzer, Sep 12 '19 at 10:10
Oh, if it stayed as a spare, then you may have some better luck recovering data! It would have only written the array metadata at the beginning of the "user area" of the array. That'd blow away any partition table at the front and possibly some of the filesystem in the first partition. If you used GPT, there should still be a backup partition table at the end, from which you might recover. If you're doing MBR-style partitioning though, you'd need to try to figure out what your partitions were. — Mike Andrews, Sep 12 '19 at 14:35
And I totally agree, this is a bug that mdadm could prevent! — Mike Andrews, Sep 12 '19 at 14:35

score 1 · Answer 1 · answered Sep 12 '19 at 11:06

Alright, so I was able to do some black-magicy stuff, and I was able to reassemble the raid without the md127 drive being a spare device. What I did was this:

1) I created a mdadm.conf file with my raid details, and added the <ignore> flag to prevent it from being auto-assembled

1.5) If the raid was assembled, I marked the raid as a failed device with mdadm --manage /dev/md127 --fail /dev/md127 and then restarted the computer to disassemble the raid. Im not sure if this step is necessary, but it didn't hurt to do it anyway

2) I manually reassembled the raid by specifying all of the drives in the raid with mdadm --assemble /dev/md127 /dev/sdX /dev/sdY /dev/sdZ

3) I moved all of the data to another drive not connected with the raid so that this won't happen again

mdadm raid5 was added as a spare to itself

1 Answers1