0

I'm testing a RAID10 array here with mdadm. I wanted to see how many failed devices it could tolerate, rebuild times, etc. At one point I had it doing a resync on 5 or 6 devices, then I rebooted it, Now it is showing inactive and I'm not sure what it is doing or how to get it back.

There's nothing important on there and I could just recreate it, but I'd prefer to figure out what went wrong and whether it can be recovered.

root@netcu1257-vs-02:~# cat /proc/mdstat  Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]  md0 : inactive sdz[19] sdy[18] sdx[17] sdw[16] sdv[15] sdu[14] sds[12] sdt[13] sdr[11] sdq[10](S) sdp[21] sdn[8] sdm[7] sdo[9] sdl[6] sdj[20](R) sdk[22](S) sdi[4](S) sdh[3] sdf[1] sde[0] sdg[2]
              257812572160 blocks super 1.2
        root@netcu1257-vs-02:~# mdadm -D /dev/md0
        /dev/md0:
                   Version : 1.2
             Creation Time : Fri Oct 29 13:59:41 2021
                Raid Level : raid10
             Used Dev Size : 18446744073709551615
              Raid Devices : 20
             Total Devices : 22
               Persistence : Superblock is persistent
    
           Update Time : Mon Nov  8 09:59:42 2021
                 State : active, FAILED, Not Started 
        Active Devices : 13
       Working Devices : 22
        Failed Devices : 0
         Spare Devices : 9
    
                Layout : near=2
            Chunk Size : 512K
    
    Consistency Policy : unknown
    
                  Name : netcu1257-vs-02:0  (local to host netcu1257-vs-02)
                  UUID : c3418360:4fb5857c:eb952018:163a60c6
                Events : 85985
    
        Number   Major   Minor   RaidDevice State
           -       0        0        0      removed
           -       0        0        1      removed
           -       0        0        2      removed
           -       0        0        3      removed
           -       0        0        4      removed
           -       0        0        5      removed
           -       0        0        6      removed
           -       0        0        7      removed
           -       0        0        8      removed
           -       0        0        9      removed
           -       0        0       10      removed
           -       0        0       11      removed
           -       0        0       12      removed
           -       0        0       13      removed
           -       0        0       14      removed
           -       0        0       15      removed
           -       0        0       16      removed
           -       0        0       17      removed
           -       0        0       18      removed
           -       0        0       19      removed
    
           -      65      112       17      sync set-B   /dev/sdx
           -       8       64        0      spare rebuilding   /dev/sde
           -       8      208        8      sync set-A   /dev/sdn
           -      65       80       15      sync set-B   /dev/sdv
           -       8      176        6      sync set-A   /dev/sdl
           -      65       48       13      sync set-B   /dev/sdt
           -       8      144        5      spare rebuilding   /dev/sdj
           -      65       16       11      sync set-B   /dev/sdr
           -       8      112        3      sync set-B   /dev/sdh
           -       8      240        7      spare rebuilding   /dev/sdp
           -      65      128       18      sync set-A   /dev/sdy
           -       8       80        1      sync set-B   /dev/sdf
           -       8      224        9      spare rebuilding   /dev/sdo
           -      65       96       16      sync set-A   /dev/sdw
           -       8      192       10      spare rebuilding   /dev/sdm
           -      65       64       14      sync set-A   /dev/sdu
           -       8      160        -      spare   /dev/sdk
           -      65       32       12      sync set-A   /dev/sds
           -       8      128        -      spare   /dev/sdi
           -      65        0        -      spare   /dev/sdq
           -      65      144       19      sync set-B   /dev/sdz
           -       8       96        2      spare rebuilding   /dev/sdg

As you can see, all of my devices (/dev/sd[e-z]) show up as part of md0, however it's also showing 20 missing devices. The original format of the array was 20 devices with 2 spares. And while it says it's rebuilding, there is no disk activity, and /proc/mdstat indicates the same.

Is this recoverable? And given that the array was rebuilding before the host was rebooted, what could I have done to ensure the rebuild would have continued and the array remain active after a reboot?

edit:

I found my mdadm.conf file had been incorrectly placed in /etc/. I moved it to /etc/mdadm/ and rebooted, now my array is showing as a RAID0, still inactive:

root@netcu1257-vs-02:~# mdadm -D /dev/md0
/dev/md0:
           Version : 1.2
        Raid Level : raid0
     Total Devices : 22
       Persistence : Superblock is persistent

             State : inactive
   Working Devices : 22

              Name : netcu1257-vs-02:0  (local to host netcu1257-vs-02)
              UUID : c3418360:4fb5857c:eb952018:163a60c6
            Events : 85985

    Number   Major   Minor   RaidDevice

       -      65      112        -        /dev/sdx
       -       8       64        -        /dev/sde
       -       8      208        -        /dev/sdn
       -      65       80        -        /dev/sdv
       -       8      176        -        /dev/sdl
       -      65       48        -        /dev/sdt
       -       8      144        -        /dev/sdj
       -      65       16        -        /dev/sdr
       -       8      112        -        /dev/sdh
       -       8      240        -        /dev/sdp
       -      65      128        -        /dev/sdy
       -       8       80        -        /dev/sdf
       -       8      224        -        /dev/sdo
       -      65       96        -        /dev/sdw
       -       8      192        -        /dev/sdm
       -      65       64        -        /dev/sdu
       -       8      160        -        /dev/sdk
       -      65       32        -        /dev/sds
       -       8      128        -        /dev/sdi
       -      65        0        -        /dev/sdq
       -      65      144        -        /dev/sdz
       -       8       96        -        /dev/sdg
clarknova
  • 11
  • 3
  • I doubt this should be on ServerFault. This site is about business problems, not about curious experiments with Linux. I'd move this to Unix&Linux. – Nikita Kipriyanov Nov 11 '21 at 06:53
  • Why do you assume this is not for use in a business? What business are you involved in that doesn't test its technology before putting it in production? – clarknova Nov 12 '21 at 04:54

1 Answers1

0

You need to re-add all the drives.

For all in set-A. then the same for set-B

mdadm --manage /dev/mdN -a /dev/sdX1

Before that try a simple

mdadm --assemble /dev/mdN /dev/sd? ...
  • https://www.thomas-krenn.com/en/wiki/Mdadm_recovery_and_resync – Ярослав Рахматуллин Nov 11 '21 at 03:51
  • 1
    They likely will see the message about device is being busy if they try to follow your advice. Because devices will be already taken. Also the recommended source of information is [the official Linux RAID wiki](https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn) – Nikita Kipriyanov Nov 11 '21 at 06:49
  • Thanks. I have already gone over the information on the mdadm page in the wiki but hadn't seen the Assemble Run page. I will work with the good information on there, as well as in this answer. – clarknova Nov 12 '21 at 04:57