2

I have a 7 x 14TB RAID5 in my workstation with Centos 7. Last week one of the drives was marked as faulty by SMART (/dev/sde). I used mdadm to mark this drive as faulty and to remove it from the array and ... long story short... I ended up pulling out the wrong drive!

Now I have Centos in emergency mode (my operating system resides on a drive outside the array) and I am able to run mdadm to analyze the array. It seems my /dev/md127 array is inactive with all drives marked as spares.

cat /proc/mdstat
Personalities :
md127 : inactive sdc[6](S) sdf[9](S) sdg[10](S) sde[8](S) sdd[7](S) sdb[5](S) sdh[11](S)
95705752576 blocks super 1.2

unused devices: <none>

For some reason here it shows as raid0:

mdadm -D /dev/md127

/dev/md127:
Version : 1.2
Raid Level : raid0
Total Devices : 7
Persistence : Superblock is persistent

State : inactive
Working Devices : 7

Name : c103950:127
UUID : a6f44e2c:352b1ea0:bd25d626:cac0177c
Events : 539502
Number  Major   Minor   RaidDevice

   -      8   16        -        /dev/sdb
   -      8   32        -        /dev/sdc
   -      8   48        -        /dev/sdd
   -      8   64        -        /dev/sde
   -      8   80        -        /dev/sdf
   -      8   96        -        /dev/sdg
   -      8  112        -        /dev/sdh

And when I examine the individual drives:


mdadm -E /dev/sdb
/dev/sdb:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a6f44e2c:352blea0:bd25d626:cac0177c
Name : c103950:127
Creation Time : Thu Jul 26 12:21:27 2018
Raid Level : raid5
Raid Devices : 7

Avail Dev Size : 27344500736 sectors (13038.87 GiB 14000.38 GB)
Array Size : 82033502208 KiB (78233.24 GiB 84002.31 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before-264112 sectors, after-0 sectors
State : clean
Device UUID : 136b95a5:1589d83d:bdb059dd:e2e9e02f

Update Time : Thu Jul 15 12:47:37 2021
Bad Block Log : 512 entries available at offset 32 sectors
Checksum: 4e727166 - correct
Events : 539502

Layout left-symmetric
Chunk Size : 512K

Device Role : Active device 1
Array State : AAAA..A ('A'== active, '.' == missing, 'R' == replacing)

****** 

mdadm -E /dev/sdc
/dev/sdc:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a6f44e2c:352b1ea0:bd25d626:cac0177c
Name : c103950:127
Creation Time : Thu Jul 26 12:21:27 2018
Raid Level : raid5
Raid Devices : 7

Avail Dev Size : 27344500736 sectors (13038.87 GiB 14000.38 GB)
Array Size : 82033502208 KiB (78233.24 GiB 84002.31 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before-264112 sectors, after-0 sectors
State : clean
Device UUID : 64cac230:bc1e2bf5:65323067:5439f101

Update Time : Thu Jul 15 12:47:37 2021
Bad Block Log : 512 entries available at offset 32 sectors
Checksum: ecd93778 - correct
Events : 539502

Layout left-symmetric
Chunk Size : 512K

Device Role : Active device 6
Array State : AAAA..A ('A'== active, '.' == missing, 'R' == replacing)

******

mdadm -E /dev/sdd
/dev/sdd:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a6f44e2c:352b1ea0:bd25d626:cac0177c
Name : c103950:127
Creation Time : Thu Jul 26 12:21:27 2018
Raid Level : raid5
Raid Devices : 7

Avail Dev Size : 27344500736 sectors (13038.87 GiB 14000.38 GB)
Array Size : 82033502208 KiB (78233.24 GiB 84002.31 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before-264112 sectors, after-0 sectors
State : clean
Device UUID : 2dd7e6d6:6c035b33:0072796b:d3685558

Update Time : Thu Jul 15 12:47:37 2021
Bad Block Log : 512 entries available at offset 32 sectors
Checksum: 2bda98d - correct
Events : 539502

Layout left-symmetric
Chunk Size : 512K

Device Role : Active device 0
Array State : AAAA..A ('A'== active, '.' == missing, 'R' == replacing)

******

mdadm -E /dev/sde
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a6f44e2c:352b1ea0:bd25d626:cac0177c
Name : c103950:127
Creation Time : Thu Jul 26 12:21:27 2018
Raid Level : raid5
Raid Devices : 7

Avail Dev Size : 27344500736 sectors (13038.87 GiB 14000.38 GB)
Array Size : 82033502208 KiB (78233.24 GiB 84002.31 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before-264112 sectors, after-0 sectors
State : active
Device UUID : 8e6bd6de:15483efa:82c1917d:569ee387

Update Time : Thu Jul 13 10:30:54 2021
Bad Block Log : 512 entries available at offset 32 sectors
Checksum: c050eb4 - correct
Events : 539489

Layout left-symmetric
Chunk Size : 512K

Device Role : Active device 4
Array State : AAAAAAA ('A'== active, '.' == missing, 'R' == replacing)

******

mdadm -E /dev/sdf
/dev/sdf:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a6f44e2c:352b1ea0:bd25d626:cac0177c
Name : c103950:127
Creation Time : Thu Jul 26 12:21:27 2018
Raid Level : raid5
Raid Devices : 7

Avail Dev Size : 27344500736 sectors (13038.87 GiB 14000.38 GB)
Array Size : 82033502208 KiB (78233.24 GiB 84002.31 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before-264112 sectors, after-0 sectors
State : clean
Device UUID : 93452dc8:3fba28ce:c7d33d00:7c1838fd

Update Time : Thu Jul 15 12:47:37 2021
Bad Block Log : 512 entries available at offset 32 sectors
Checksum: e995ceb8 - correct
Events : 539502

Layout left-symmetric
Chunk Size : 512K

Device Role : Active device 2
Array State : AAAA..A ('A'== active, '.' == missing, 'R' == replacing)

******

mdadm -E /dev/sdg
/dev/sdg:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a6f44e2c:352b1ea0:bd25d626:cac0177c
Name : c103950:127
Creation Time : Thu Jul 26 12:21:27 2018
Raid Level : raid5
Raid Devices : 7

Avail Dev Size : 27344500736 sectors (13038.87 GiB 14000.38 GB)
Array Size : 82033502208 KiB (78233.24 GiB 84002.31 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before-264112 sectors, after-0 sectors
State : clean
Device UUID : 48fe7b1b:751e6993:4eb73b66:a1313185

Update Time : Thu Jul 15 12:47:37 2021
Bad Block Log : 512 entries available at offset 32 sectors
Checksum: f81be84f - correct
Events : 539502

Layout left-symmetric
Chunk Size : 512K

Device Role : Active device 3
Array State : AAAA..A ('A'== active, '.' == missing, 'R' == replacing)

******

mdadm -E /dev/sdh
/dev/sdh:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : a6f44e2c:352b1ea0:bd25d626:cac0177c
Name : c103950:127
Creation Time : Thu Jul 26 12:21:27 2018
Raid Level : raid5
Raid Devices : 7

Avail Dev Size : 27344500736 sectors (13038.87 GiB 14000.38 GB)
Array Size : 82033502208 KiB (78233.24 GiB 84002.31 GB)
Data Offset : 264192 sectors
Super Offset : 8 sectors
Unused Space : before-264112 sectors, after-0 sectors
State : clean
Device UUID : 80448326:c8b82624:a8e31b97:18246b58

Update Time : Thu Jul 15 12:04:35 2021
Bad Block Log : 512 entries available at offset 32 sectors
Checksum: 9800dd88 - correct
Events : 539497

Layout left-symmetric
Chunk Size : 512K

Device Role : Active device 5
Array State : AAAA.AA ('A'== active, '.' == missing, 'R' == replacing)****** 

/dev/sde is the faulty drive, while the /dev/sdh is the one I pulled by mistake. Notice the difference in events and times of update. I now want to reassemble the array and wonder what is the safest way to do so.

Please help! Thank you for reading.

lalmagor
  • 21
  • 3
  • 1
    Oh Great, you had Raid ZERO - restore the backup, well done :-) – djdomi Jul 19 '21 at 09:27
  • 2
    Can you try with `mdadm --incremental /dev/sd[abcdfgh]`? – shodanshok Jul 19 '21 at 13:19
  • 1
    Mdadm should prevent you from making a mistake, so long as you do NOT use `--force`. It's the use of `--force` where people run into trouble. You're looking to assemble the array including the drive you pulled, but without the drive that you `--fail`ed out. Then you'll `--re-add` the drive you `--fail`ed once the array is up and running. You'll need to get the array up and running. I agree with @shodanshok, you may be able to simply use incremental assembly to get back up and running. – Mike Andrews Jul 19 '21 at 17:28
  • Thanks for your answers. --incremental seems to be exactly what I need but I am still worried that it will try to build this as Raid0 because this is what it is showing now when I check with mdadm --D. Can I do "mdadm --incremental --level=5 /dev/sd[abcdfgh]"? or should I do "mdadm --create --verbose /dev/md127 --level=5 /dev/sdb /dev/sdc /dev/sdd /dev/sdf /dev/sdg /dev/sdh"? – lalmagor Jul 19 '21 at 17:47
  • 2
    I think `--incremental` does not permit to specify the raid level. Anyway your HDD superblocks seems to correctly describe a raid5 arrays, so I would try `--incremental` (***without*** forcing anything) to start the array. – shodanshok Jul 19 '21 at 20:34
  • I used "mdadm --stop /dev/md127" and then was able to run "mdadm --incremental" to each of my six good drives but it still says "not enough to start". When I run "mdadm --D /dev/md127", it is still the same, with all empty and thinking it is raid0. – lalmagor Jul 19 '21 at 21:45
  • what about "mdadm --assemble /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sdf /dev/sdg /dev/sdh" ? Do you think this would work? Can I specify the raid level here too? – lalmagor Jul 19 '21 at 22:14

1 Answers1

0

I was able to solve this by running:

mdadm --assemble --force /dev/md127 /dev/sdb /dev/sdc /dev/sdd /dev/sdf /dev/sdg /dev/sdh

Which restored my array in a degraded state with 6/7 drives. It did not work without the --force option. I guess I was lucky there weren't so many event count differences between /dev/sdh and the rest. Afterwards was able to add the new disk to the array with:

mdadm --manage /dev/md127 --add /dev/sde

After 49 hours of rebuilding, my array was complete again.

I think my problem was similar to: https://unix.stackexchange.com/questions/163672/missing-mdadm-raid5-array-reassembles-as-raid0-after-powerout

I also used this guide: https://web.archive.org/web/20210302160944/http://www.tjansson.dk/2013/12/replacing-a-failed-disk-in-a-mdadm-raid/

lalmagor
  • 21
  • 3