Linux Software RAID recovery

Question

I am seeing a discrepancy between the output of mdadm --detail and mdadm --examine, and I don't understand why.

This output

mdadm --detail /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Wed Mar 14 18:20:52 2012
     Raid Level : raid10
     Array Size : 3662760640 (3493.08 GiB 3750.67 GB)
  Used Dev Size : 1465104256 (1397.23 GiB 1500.27 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 2
    Persistence : Superblock is persistent

Seems to contradict this. (the same for every disk in the array)

mdadm --examine /dev/sdc2
/dev/sdc2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 1f54d708:60227dd6:163c2a05:89fa2e07 (local to host)
  Creation Time : Wed Mar 14 18:20:52 2012
     Raid Level : raid10
  Used Dev Size : 1465104320 (1397.23 GiB 1500.27 GB)
     Array Size : 2930208640 (2794.46 GiB 3000.53 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 2

The array was created like this.

mdadm -v --create  /dev/md2 \
  --level=raid10 --layout=o2 --raid-devices=5 \
  --chunk=64 --metadata=0.90 \
 /dev/sdg2 /dev/sdf2 /dev/sde2 /dev/sdd2 /dev/sdc2

Each of the 5 individual drives have partitions like this.

Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00057754

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            2048       34815       16384   83  Linux
/dev/sdc2           34816  2930243583  1465104384   fd  Linux raid autodetect

Backstory

So the SATA controller failed in a box I provide some support for. The failure was a ugly and so individual drives fell out of the array a little at a time. While there are backups, we the are not really done as frequently as we really need. There is some data that I am trying to recover if I can.

I got additional hardware and I was able to access the drives again. The drives appear to be fine, and I can get the array and filesystem active and mounted (using read-only mode). I am able to access some data on the filesystem and have been copying that off, but I am seeing lots of errors when I try to copy the most recent data.

When I am trying to access that most recent data I am getting errors like below which makes me think that the array size discrepancy may be the problem.

Mar 14 18:26:04 server kernel: [351588.196299] dm-7: rw=0, want=6619839616, limit=6442450944
Mar 14 18:26:04 server kernel: [351588.196309] attempt to access beyond end of device
Mar 14 18:26:04 server kernel: [351588.196313] dm-7: rw=0, want=6619839616, limit=6442450944
Mar 14 18:26:04 server kernel: [351588.199260] attempt to access beyond end of device
Mar 14 18:26:04 server kernel: [351588.199264] dm-7: rw=0, want=20647626304, limit=6442450944
Mar 14 18:26:04 server kernel: [351588.202446] attempt to access beyond end of device
Mar 14 18:26:04 server kernel: [351588.202450] dm-7: rw=0, want=19973212288, limit=6442450944
Mar 14 18:26:04 server kernel: [351588.205516] attempt to access beyond end of device
Mar 14 18:26:04 server kernel: [351588.205520] dm-7: rw=0, want=8009695096, limit=6442450944

What says `cat /proc/mdstat`? And what's you RAID10 configuration, when there is mirror of stripping disks, and therefore number of disk have to be divisible by 2? — Jan Marek, Mar 15 '12 at 08:34
Is there a possibility, that in this array was 6 disks, not only 5? This is an explain for another size in MD superblock... Try add `missing` keyword to the `mdadm` while you will create the array. — Jan Marek, Mar 15 '12 at 11:37
I suspect the LVM. Was any snapshotting in use? Does the filesystem fsck, or do these failures occur during that process? — Shane Madden, Mar 17 '12 at 00:45
RAID10 with 5 drives usually implies 4 data plus one hot spare. If it were RAID10 with an odd number of active drives and no spares, the array would only run degraded - there is no drive 6 to mirror drive 5. — Avery Payne, Jan 26 '14 at 02:52
@AveryPayne, if this wasn't linux you would be correct. But RAID10 on Linux is not a traditional RAID10 implementation. You don't need an even number of disks. The wikipedia article has a diagram showing how linux stores data for a 3 disk RAID10. http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 — Zoredache, Jan 26 '14 at 06:31
The diagram. My eyes - the goggles - they do nothing! (ha) Ok, thank you for that point of enlightenment. :) — Avery Payne, Jan 30 '14 at 00:42
@Zoredache, Just a follow-up from the Jan 26 comment: recently went to reshape an array at home from RAID6 to RAID5 (in preparation to remove a drive) and read up a bit more, including the odd but understandable ability to move from RAID1 to RAID5 with just two drives. Indeed, there's more than meets the eye. Thanks for the pointers! — Avery Payne, Feb 12 '14 at 04:23

3dinfluence · Answer 1 · 2012-03-21T03:34:58.157

If you can clone the drives with dd then I would do that. Keep your original drives as untouched as possible.

Then this is a total shoot from the hip thing but is what I'd try if I were in that situation. With the cloned drives in the system I'd erase all the RAID metadata using.
mdadm --zero-superblock /dev/sdx#
on each of the drives involved.

Then use the command to recreate the array.
mdadm -v --create /dev/md2 \ --level=raid10 --layout=o2 --raid-devices=5 \ --chunk=64 --metadata=0.90 --assume-clean \ /dev/sdg2 /dev/sdf2 /dev/sde2 /dev/sdd2 /dev/sdc2

This should get rid of all the raid level issues. From there you can try to remount the file systems and see what's left. And if this doesn't work then re-clone your drives and try something else :)

Olivier S · Answer 2 · 2012-03-15T20:36:00.413

Are you certain of your command line for creating the array? My guess is that it was a "standard" 4 drive raid10 array with a hot spare drive, which would explain the result of /dev/sdc2

Can you tell us the result of:

cat /proc/mdstat
cat /etc/mdadm.conf
mdadm --examine /dev/sdx2 ( each drive )

With this you might be able to guess which drive was the hot spare and so you will be able to reconstruct properly the array. Of course, as stated by 3dinfluence , you should duplicate the data before trying to reconfigure the array.

Edit: also it is probably not a waste of time to run: smartctl -a /dev/sdx on each drive ( check at the end of the output if errors were reported ) and then a smartcl -t long /dev/sdx and 3 or 4 hours latter a smartctl -a again to check the 5 disks are really fine. If one disk is reporting errors maybe it was detected as faulty by mdadm and so mdadm switched on the spare drive ( always a guess )

Edit 2: if vgs reports: vgdisplay shows Alloc PE/Size 3.00 TiB, Free PE/Size 421.08 This means that your PV has mysteriously grown of 421G .. I stand my case: the "mystery" growth is a wrong configuration of your array. The real size of your array is 3T. You did not reassemble it properly, so it is corrupt. In order to reassemble it properly you need to retrieve the original config and which of the drives was the spare drive. Good luck.

I am 100% certain that it used all 5 disks. I am pretty sure I was using o2. I have, and can kinda of still access an LVM volume which is on the MD device. vgdisplay shows vgdisplay shows `Alloc PE/Size 3.00 TiB, Free PE/Size 421.08 GiB`. The `mdadm --examine` shows the same array sizes on each drive. The mdadm.conf is unremarkable. All I have is the device name and a UUID `ARRAY /dev/md/2 UUID=4d5574e0-9a76-5869-163c-2a0589fa2e07`. — Zoredache, Mar 15 '12 at 07:16
The array was ~3750GB like `mdadm --detail /dev/md2` was reporting. I am 100% certain of this. I had another LV there previously using up the full ~3.5TiB capacity. This array is over 3 years old. — Zoredache, Mar 15 '12 at 07:27

Linux Software RAID recovery

2 Answers2

Linked