0

I have a Debian squeeze installation with 6x 3TB Disks.

On the SW RAID5 are 4 partitions.

  • First for gpart BIOS,
  • RAID 1 for "/",
  • RAID5 for swap
  • and the biggest part RAID5 for files.

While the system was being build all disks lay on the open case. During this I saw that one disk could possibly fall of the case and I moved two a bit. While doing this both disk spins down, sorry the system runs, and the RAID crashes with the error: error can not start while 2 missing disks.

mdadm: /dev/md2 assembled from 4 drives and 1 spare 
- not enough to start the array."

At this point I had no writeoperations. I saved some information and figure out that disk sdd4 and sdf4 failed and in my opinion sdd fails before sdf:

mdadm -E /dev/sda4
/dev/sda4:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e84f0346:3f5ff3f1:507b6f9c:0fa02c63
           Name : mfsnode1:2  (local to host mfsnode1)
  Creation Time : Tue Feb  5 17:44:45 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 5842757597 (2786.04 GiB 2991.49 GB)
     Array Size : 29213772800 (13930.21 GiB 14957.45 GB)
  Used Dev Size : 5842754560 (2786.04 GiB 2991.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 4f8851b4:001bf0c0:3aab60e0:b2c5558f

    Update Time : Tue Feb  5 17:44:45 2013
       Checksum : c0376a50 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAA. ('A' == active, '.' == missing)

mdadm -E /dev/sdb4

/dev/sdb4:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e84f0346:3f5ff3f1:507b6f9c:0fa02c63
           Name : mfsnode1:2  (local to host mfsnode1)
  Creation Time : Tue Feb  5 17:44:45 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 5842757597 (2786.04 GiB 2991.49 GB)
     Array Size : 29213772800 (13930.21 GiB 14957.45 GB)
  Used Dev Size : 5842754560 (2786.04 GiB 2991.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : c2f63fa7:768e9945:64826929:6f1f68c2

    Update Time : Tue Feb  5 17:44:45 2013
       Checksum : b3ea7d20 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAA. ('A' == active, '.' == missing)

mdadm -E /dev/sdc4

/dev/sdc4:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e84f0346:3f5ff3f1:507b6f9c:0fa02c63
           Name : mfsnode1:2  (local to host mfsnode1)
  Creation Time : Tue Feb  5 17:44:45 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 5842757597 (2786.04 GiB 2991.49 GB)
     Array Size : 29213772800 (13930.21 GiB 14957.45 GB)
  Used Dev Size : 5842754560 (2786.04 GiB 2991.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : e9861f3e:4de4d0ce:7d4b6dd7:e1215fc7

    Update Time : Tue Feb  5 17:44:45 2013
       Checksum : 86fc2eab - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAA. ('A' == active, '.' == missing)

mdadm -E /dev/sdd4

/dev/sdd4:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 7b99380e:51d754cf:921c68e9:7b830d6a
           Name : mfsnode1:2  (local to host mfsnode1)
  Creation Time : Tue Feb  5 17:06:37 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 5842757597 (2786.04 GiB 2991.49 GB)
     Array Size : 29213772800 (13930.21 GiB 14957.45 GB)
  Used Dev Size : 5842754560 (2786.04 GiB 2991.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 0da58625:14ed8675:6a7c4ba4:337d8c4b

    Update Time : Tue Feb  5 17:06:37 2013
       Checksum : 5f97164a - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAA.AA ('A' == active, '.' == missing)

mdadm -E /dev/sde4

/dev/sde4:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e84f0346:3f5ff3f1:507b6f9c:0fa02c63
           Name : mfsnode1:2  (local to host mfsnode1)
  Creation Time : Tue Feb  5 17:44:45 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 5842755584 (2786.04 GiB 2991.49 GB)
     Array Size : 29213772800 (13930.21 GiB 14957.45 GB)
  Used Dev Size : 5842754560 (2786.04 GiB 2991.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : b70cd4f6:1594cc29:b4346929:89a5ed34

    Update Time : Tue Feb  5 17:44:45 2013
       Checksum : 5a36c944 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAA. ('A' == active, '.' == missing)

mdadm -E /dev/sdf4

/dev/sdf4:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : e84f0346:3f5ff3f1:507b6f9c:0fa02c63
           Name : mfsnode1:2  (local to host mfsnode1)
  Creation Time : Tue Feb  5 17:44:45 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 5842755584 (2786.04 GiB 2991.49 GB)
     Array Size : 29213772800 (13930.21 GiB 14957.45 GB)
  Used Dev Size : 5842754560 (2786.04 GiB 2991.49 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 06202661:79792af2:6c8d02ae:769bdded

    Update Time : Tue Feb  5 17:44:45 2013
       Checksum : ca70109c - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAA. ('A' == active, '.' == missing)

I think the superblock is ok, the chunksize the same and all other parameter looks good too. After that I started to test some variations:

Test1:  
mdadm --create --level 5 -n 6 --chunk=512 --assume-clean /dev/md2 \
                /dev/sd{a,b,c,d,e,f}4
-> filesize 708MB with 20603326 lines and canceling at the end by e2fsck
- bad superblock or partition table is damage
- bad checksum of group or descriptor
- lots of invalid inodes
- canceled with lots of illegal blocks in inodes

Test2:

mdadm --create --level 5 -n 6 --chunk=512 --assume-clean /dev/md2 /dev/sd{a,b,c,d,e}4 \
                missing
-> filesize 1,3GB  with 37614367 lines and canceling by e2fsck at the end
- back to original superblock
- bad superblock or damaged partition table at the beginning
- lots of invalid inodes
- canceled with iteration of inode

Test3:

mdadm --create --level 5 -n 6 --chunk=512 --assume-clean /dev/md2 /dev/sd{a,b,c}4 \
                missing /dev/sd{e,f}4
-> filesize 1,4GB with 40745425 lines and canceling by e2fsck at the end
- errors see test2
- read error while reading next inode

Test4:

mdadm --create --level 5 -n 6 --chunk=512 --assume-clean /dev/md2 \
                /dev/sd{a,b,c,f,e,d}4
->filesize 874MB with 25412000 lines and break by e2fsck at the end
- try original superblock
- bad superblock or damaged partitiontable
- than lots of checksumm  invalid deskriptor of group
- at the end illegal block in inode to much invalid blocks in inode

Test5:

mdadm --create --level 5 -n 6 --chunk=512 --assume-clean /dev/md2 /dev/sd{a,b,c}4 \
                missing /dev/sd{e,d}4
-> filesize 1,6GB with 45673505 lines and canceling at the end by e2fsck

Test6:

mdadm --create --level 5 -n 6 --chunk=512 --assume-clean /dev/md2 /dev/sd{a,b,c,f,e}4 \
                missing
- try original superblock
- bad superblock or damage partition table
- lots of checksum error in group descriptor
- ends with conflict in inode table with another filesystem block
-> filesize 542MB with 15727702 lines and canceling at the end by e2fsck

Teset6 looks like the best one, but what do you think and perhaps what could I could do otherwise?

I think the best could be the last but I am absolutely unsure.

Please help for further investigation and hopefully recovery of my data.

MDMarra
  • 100,734
  • 32
  • 197
  • 329
Sunghost
  • 11
  • 3
  • 1
    What do you mean with "While the system was being build all disks". DO you mean you was just installing it? In that case, just wipe and reinstall and restore the data from backup. That is probably much faster. – Hennes Mar 02 '13 at 20:30
  • 7
    Using six 3TB drives in RAID-5 is begging to lose data. http://serverfault.com/questions/192995/raid-5-with-big-sata-disks-do-or-dont – Evan Anderson Mar 02 '13 at 20:33
  • Test 6 looks closest to the 'Active Device' from your Mdadm -E outputs (counting from 0 to 5 via partitions a4/b4/c4/f4/e4/d4.) Still, if you have backups: use them. And before you recreate think about what you are doing and why. RAID 5 can be useful in *some* situations. It is not the right thing in all of them. – Hennes Mar 02 '13 at 20:50
  • The system is private and i build on it if i have time. thats why the case is still open. that was mentioned with "while the system.." and the raid and all was good and runs. there was no rebuild or write action on it, the power was just on. – Sunghost Mar 02 '13 at 21:09
  • i know this facts, but as a think about it, it was a solution with max of diskcapacity and probably a bit failure tollerance. i know all the facts, but as long as i didnt win some money, i cant by a backupsolution for 40+TB i also had build up a raid10. ok forget all these things please and help me to probably rescue the raid..ok you think i should give :"mdadm --create --level 5 -n 6 --chunk=512 --assume-clean /dev/md2 /dev/sd{a,b,c,f,e}4 missing" a chance? after that i would mount the raid and check the files?perhaps i think about a raid6 solution but in mind to loose 3 and up more diskspace – Sunghost Mar 02 '13 at 21:15
  • Hello,i had the idea to check old mails and found that:active raid5 sda4[0] sdf4[5] sde4[4] sdd4[3](F) sdc4[2] sdb4[1] so all looks like the sdd4 fails first an now is the question which steps i have to make for rescue - need help – Sunghost Mar 03 '13 at 21:53
  • I do create with missing sdd. Than i mount -a and get :" mount -a "mount: wrong fs type, bad option, bad superblock on /dev/md2, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so" And "EXT4-fs (md2): bad geometry: block count 3651722880 exceeds size of device (3651721600 blocks)" and the mount fails Ok is the superblock on sdf4 bad? What can i do next? thx – Sunghost Mar 03 '13 at 23:09

1 Answers1

0

If you have lost two drives in your RAID 5 array, there is no recovery from that at the RAID array level, the array has been lost. You will need to resort to attempting to get the drives working again, which isn't very likely. There are drive recovery tools, like R-Studio, with which you might have a little luck. But most likely you will need to send your disks to a forensic shop that can do platter recovery. This is very costly, however.