3

Does anybody have experience dealing with raid config on these new Thinkpad servers?

My problem is the following: In order to be able boot this server I must put the drives to RAID1 to create an SCM device.

I did this with 2x1TB drives (then due to the debian wheezy installer did not even recognize the raid controller I had to install the system externally with debootstrap on other machine).

Finally I ended up with a working system and now I would like to put it into raid.

md126 : active raid1 sda[0]
      975585280 blocks super external:/md127/0 [2/1] [U_]

md127 : inactive sda[0](S)
      1177304 blocks super external:ddf

unused devices: <none>

However when I try to readd the second drive (yes it got sdg drive letter) to the array, I get this error message:

mdadm --manage /dev/md126 --add /dev/sdg
mdadm: Cannot add disks to a 'member' array, perform this operation on the parent container

If I examime the 2 disk separately I see:

/dev/sda:
          Magic : de11de11
        Version : 01.00.00
Controller GUID : 4C534920:20202020:FFFFFFFF:FFFFFFFF:FFFFFFFF:FFFFFFFF
                  (LSI     )
 Container GUID : 4C534920:20202020:80861D60:00000000:4229D10D:4229E531
                  (LSI      03/05/15 16:32:29)
            Seq : 00000001
  Redundant hdr : yes
  Virtual Disks : 1

      VD GUID[0] : 4C534920:20202020:80861D60:00000000:422AD2BC:00001450
                  (LSI      03/06/15 10:51:56)
         unit[0] : 0
        state[0] : Degraded, Not Consistent
   init state[0] : Fully Initialised
       access[0] : Read/Write
         Name[0] : 
 Raid Devices[0] : 2 (0 1)
   Chunk Size[0] : 128 sectors
   Raid Level[0] : RAID1
  Device Size[0] : 975585280
   Array Size[0] : 975585280

 Physical Disks : 2
      Number    RefNo      Size       Device      Type/State
         0    ee4c2c39  975585280K /dev/sda        active/Online
         1    f70c96f2  975585280K                 active/Offline, Failed, Missing


/dev/sdg:
          Magic : de11de11
        Version : 01.00.00
Controller GUID : 4C534920:20202020:FFFFFFFF:FFFFFFFF:FFFFFFFF:FFFFFFFF
                  (LSI     )
 Container GUID : 4C534920:20202020:80861D60:00000000:4229D10D:4229E531
                  (LSI      03/05/15 16:32:29)
            Seq : 0000002b
  Redundant hdr : yes
  Virtual Disks : 1

      VD GUID[0] : 4C534920:20202020:80861D60:00000000:4229F055:00001450
                  (LSI      03/05/15 18:45:57)
         unit[0] : 0
        state[0] : Degraded, Consistent
   init state[0] : Not Initialised
       access[0] : Read/Write
         Name[0] : 
 Raid Devices[0] : 2 (0 1)
   Chunk Size[0] : 128 sectors
   Raid Level[0] : RAID1
  Device Size[0] : 975585280
   Array Size[0] : 975585280

 Physical Disks : 2
      Number    RefNo      Size       Device      Type/State
         0    ee4c2c39  975585280K                 active/Offline, Failed, Missing
         1    f70c96f2  975585280K /dev/sdg        active/Online

What is really going on here with these md126 devices?! I think this lenovo raid controller is nothing more than a fake raid controller what I encountered many on HP servers which let's you create a raid array but then it's on your OS to do the raid replication by itself, so nothing better than doing it on your own with MDAM. Matter of fact it complicates things even more unnecessarily.

I would've love to skip this entire hardware raid if I could've make the machine boot on other way...

I think the answer for this question will be useful for a lot of other people who run into this relatively new server series.

Thanks

zino
  • 61
  • 1
  • 5

2 Answers2

3

So I answering my own question for everyone's benefit who has to deal with these type of fake raid controllers.

Here is what I did:

1, Zero the superblock out on the second disk (sdg) which was written into it by the raid bios at start

mdadm --zero-superblock /dev/sdg

2, Now interestingly the md126 is not the main raid array:

mdadm -Q --examine /dev/md126
/dev/md126:
   MBR Magic : aa55
Partition[0] :       979902 sectors at           63 (type 83)
Partition[1] :    195318270 sectors at       979965 (type 83)
Partition[2] :     29302560 sectors at    196298235 (type 82)
Partition[3] :   1727924373 sectors at    225600795 (type 83)

3, It is md127. So all I did was readding this new drive to md127 with:

mdadm --manage /dev/md127 --force --add /dev/sdg

I had to force it because the drive was slightly bigger.

4, Now the raid is rebuilding itself.

Personalities : [raid1] 
md126 : active raid1 sdg[2] sda[0]
      975585280 blocks super external:/md127/0 [2/1] [U_]
      [>....................]  recovery =  3.3% (32576000/975585280) finish=203.9min speed=77076K/sec

md127 : inactive sdg[1](S) sda[0](S)
      2354608 blocks super external:ddf

unused devices: <none>

What I'm curious about is to see what will Lenovo's raid bios say about the array at next reboot. Will it recognize it as a healthy array or say it is still degraded (what I suspect). I strongly recommend anybody against buying these cheap crap Thinkservers, the lenovo brand doesn't even deserve to be capitalized anymore due to the garbage laptops what they make recently (same goes for servers).

Also there is something device mapper ioctl related in the logs after the recovery started. Hopefully it won't effect the rebuild of the array.

[Tue Mar 17 12:29:07 2015] md: recovery of RAID array md126
[Tue Mar 17 12:29:07 2015] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[Tue Mar 17 12:29:07 2015] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[Tue Mar 17 12:29:07 2015] md: using 128k window, over a total of 975585280k.
[Tue Mar 17 12:29:08 2015] device-mapper: table: 254:0: mirror: Device lookup failure
[Tue Mar 17 12:29:08 2015] device-mapper: ioctl: error adding target to table
[Tue Mar 17 12:29:09 2015] device-mapper: table: 254:0: mirror: Device lookup failure
[Tue Mar 17 12:29:09 2015] device-mapper: ioctl: error adding target to table
[Tue Mar 17 12:29:16 2015] device-mapper: table: 254:1: mirror: Device lookup failure
[Tue Mar 17 12:29:16 2015] device-mapper: ioctl: error adding target to table
zino
  • 31
  • 1
0

(this is not an answer, but simply a side note for someone trying to correct the issue using Webmin - please see answer above for better explanation)

I tried adding the spare through Webmin (UI way), but since it incorrectly also sees md126 as the main raid array it would've been impossible in Webmin .. however, I was able to see the rebuild progress in Webmin > Hardware > Linux RAID.:

Zeroing the superblock on the spare did not work in my case so I just skipped that step. In my case also, md127 was also the main array and and simply adding the spare to the correct RAID device worked:

mdadm --manage /dev/md127 --add /dev/sdc

It failed in Webmin because webmin would do:

mdadm --manage /dev/md126 --add /dev/sdc

Here are my steps (console top, Webmin bottom):

Webmin v 1.890 showing wrong RAID device name

Michael M
  • 240
  • 2
  • 6
  • @swelljoe .. I am using Webmin 1.890. [I opened an issue for this](https://github.com/webmin/webmin/issues/998) – Michael M Dec 01 '18 at 20:41