RAID5 rescue with LVM2 running on it

Question

I have a RAID5-array consisting out of 6 1TB drives. Recently with hardware changes i needed to recreate the raid to rewrite the bootloader on each drive.

Now the problem starts. Starting the Debian rescue system, a mdadm --create occured. So far, bad situation.

I've tried to reassemble the device, but the original settings are no longer available. When i use

mdadm --examine --scan

the output only shows the array created on 27th of April 2015, the original raid was created in December 2012.

I've recognized that during the build command, the order of the drives is important. That put me in to the situation to probe each kind of combination. So far, we have 6! = 720 different possibilities to plug all devices.

Automated it with iterations on the partions with:

mdadm --create /dev/md0 --readonly --level=5 --assume-clean --raid-devices=6 /dev/sdd2 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdf2 /dev/sde2

Got only 44 combinations which addressed a working LVM records beyond it. I've though i've got it.

The actual problem starts now. When i run:

pvscan

The volume is shown and activated, all three volumes are listed:

nas-root
nas-swap
nas-storage

The problem is now, that in each of the 44 combinations i cannot mount the system.

It shows me an error that the NTFS signature is invalid. But there should be a ext3/4-filesystem on it.

Is it possible that assembling the array in the right order but with the wrong strip size this could happen?

As i created the array in December 2012 I assume to have used the default settings, means 512 chunksize, left-symetric raid5.

Is it safe for the underlying data when i run multiple

mdadm --create ... --chunk=X /dev/sd* (in different order)

Additional Note:

vgscan prints with -d that it is running on a degraded raid. Maybe this could be the problem? But how to fix it?

Additional Help:

To support full tests, i've created a read-only overlay on the raid (as i don't have space to image all 1 TB disks).

For other users this might be useful:

#!/bin/bash
dev=$1
tmp="/tmp"
if [ "no$dev" == "no" ]; then
    echo "Usage $0 /dev/sdx."
    echo "Overlays are placed in $tmp"
    exit 1
fi

base=`basename $dev`
ovl="$tmp/overlay.$base"
if [ -e $ovl ]
then
   rm $ovl
fi
truncate -s50G $ovl
newdev="$base-ovl"
size=$(blockdev --getsize "$dev")
# you need to have enough loop devices
loop=$(losetup -f --show "$ovl")

printf '%s\n' "0 $size snapshot $dev $loop P 8" | dmsetup create "$newdev"

You can also bind an image to a loopback and connect it to an overlay.

What helped me so far:

Not an answer but for quite a few years it's been considered 'worst practice' to use RAID 5 at all and especially on 1TB disks or larger. The reason is dull and maths based but in a large R5 array when you have to replace a single disk you're essentially guaranteeing to introduce an unrecoverable read-error - there's LOTS about this on this site but generally we only recommend R1/10 and R6/60. Sorry to moan about this but I thought you'd be interested. Hope someone here can help. — Chopper3, Apr 29 '15 at 09:25

score 0 · Accepted Answer · answered Apr 29 '15 at 10:47

0

When assembling an array, both drive order and chunk size are of utmost importance. Please consider that, while Linux software RAID switched to use 512K-sized chunks recently, some years ago it used 64K-size chunks.

This means that your previous array was probably created with the old default 64K-chunk. Retry to assemble the array using 64K chunks and report back your results.

answered Apr 29 '15 at 10:47

shodanshok

47,711
7
111
180

What happens when i vary the chunk size? Does it physically change something on the drive? I know that the metadata is written on the raid drives. But beyond them? – roalter Apr 29 '15 at 10:49
MD assemble should alter only its metadata portion. However, activating a volume group and / or a logical volume does change _its_ metadata, which (if the array is not correctly assembled) can theoretically change some wrong data sector. Anyway, you have not much other to try - even due to your previous attempts. So, I would give that chunk-size ballet a try. If it succeed, please do _not_ mount your filesystem read/write - mount it in readonly mode. – shodanshok Apr 29 '15 at 11:09
Tested it with different chunk sizes. The best match was with 64k. Looks like the change mentioned above. – roalter Apr 29 '15 at 21:17
But still the partition is unable to be mounted. Hm. Is there something else possible to vary? – roalter Apr 29 '15 at 21:24
Thanks to the comment of @shodanshok i've got the raid running again. Either Debian's mdadm changed parameters or i've accidentally optimized it to use 64k Chunks. **Solution:** I've tested all possible combinations using a python script. Whenever a raid was assembled, i've waited for the LVM to settle, running a dumpe2fs on the LVM partition. From about 2000 possibilities, 24 had a valid ext4-superblock, only 6 of them with valid journal. Using testdisk i've found the right constallation. Thanks! – roalter May 12 '15 at 07:57

RAID5 rescue with LVM2 running on it

1 Answers1

Linked