0

I have a dedicated server with a Raid I with /dev/sda and /dev/sdb in which /dev/sda started to fail so I ordered its replacement. After the intervention the server went into rescue mode and I was not given any info from support so I've been trying to get it back up.

I realised even though they replaced the defective hard drive, they didn't bother to copy over the partitions from /dev/sdb and add the new hard drive to the mdadm. So after doing this myself, I see on /proc/mdstat that the recovery is now underway.

root@rescue:/mnt/etc# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] 
md2 : active raid1 sda2[0] sdb2[1]
      523200 blocks [2/2] [UU]

md3 : active raid1 sda3[0] sdb3[1]
      20478912 blocks [2/2] [UU]

md4 : active raid1 sdb4[1] sda4[2]
      3884961728 blocks [2/1] [_U]
      [====>................]  recovery = 22.4% (872776320/3884961728) finish=254.3min speed=197355K/sec
      bitmap: 1/29 pages [4KB], 65536KB chunk

unused devices: <none>

I searched on my backups for the appropriate root partition on fstab:

# <file system> <mount point> <type>  <options> <dump>  <pass>
/dev/md3  / ext4  errors=remount-ro,usrjquota=quota.user,jqfmt=vfsv0  0 1
/dev/md2  /boot ext4  errors=remount-ro 0 1
/dev/md4  /home ext4  defaults,usrquota 1 2
/dev/sda5 swap  swap  defaults  0 0
/dev/sdb5 swap  swap  defaults  0 0
proc    /proc proc  defaults    0 0
sysfs   /sys  sysfs defaults    0 0
/dev/sda1 /boot/efi vfat  defaults  0 0
tmpfs   /dev/shm  tmpfs defaults  0 0
devpts    /dev/pts  devpts  defaults  0 0
/usr/tmpDSK             /tmp                    ext3    defaults,noauto        0 0

And I made sure it was intact by mounting it from rescue mode.

Then I tried netboot specifying /dev/md3 as the root partition. However when I try to do this the server goes immediately into emergency mode explaining there are some issues with /dev/sda (I imagine because it's still being rebuilt).

FAT-fs (sda1): bogus number of reserved sectors
FAT-fs (sda1): Can't find a valid FAT filesystem

I can't pass the login prompt as my root password seems to be unrecognised, so I don't really know what the results of journalctl -xb show, but I imagine the checksum for that hard drive isn't adding up.

The question is, is there any way to restart the server using the raid but prioritising /dev/sdb while the RAID is being rebuilt? I know that every reboot makes the mdadm repair process go back to 0% so I'm now trying to be completely sure that if I try something else it'll work.

Javi Prieto
  • 113
  • 4

1 Answers1

2
/dev/sda1 /boot/efi vfat  defaults  0 0

Your UEFI ESP is not on a disk array. Need this to boot, but it cannot really be an array.

Recover a working file system. Restore it from backup, or reinstall it. Which per the sysadmin guide is:

yum reinstall grub2-efi shim

(In theory, you can resync an efi disk with mdadm. Problem is, individual members may be updated by EFI firmware outside the array, so that is super ugly and hacky.)

John Mahowald
  • 32,050
  • 2
  • 19
  • 34
  • Thanks for pointing me in the right direction. It turns out that even though I copied over all the partitions from `/dev/sdb` to `/dev/sda`, as you point out, `sda1` and `sdb1` aren't part of the disk array, and in fact they should not. I was able to reformat `/dev/sda1` and reinstall the grub. Before that, /dev/sdb1 had the right boot data so I could change the boot entry on `fstab` and boot as usual. – Javi Prieto Jul 09 '19 at 22:22