RAID1 disks "forked", now duplicate

Question

I have got a server hosted by OVH, which is configured to have a soft RAID1 array, /dev/md0, that holds a LVM PV. There is another /dev/md1 array, which was supposed to be the /boot, but which is finally not in use.

This morning, I received from my server lots of logs indicating that my /dev/sdb produced I/O errors (basically I guess it's dead). Afterwards my MySQL crashed, and my SSH refused all connections. I had no choice but to reboot (since it's a remote server I cannot access physically).

When it booted up, the web server in use was nginx, which was the server I used initially, but that I replaced some time ago now by Apache.

Finding that disturbing, I immediately rebooted to a rescue mode, and try to calmly retrieve my data and try to get things ready to change my disk.

Now, I have done that, and to my surprise, after doing a mdadm --assemble --scan, the resulting mdstats is:

# cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] 
md125 : active raid1 sda1[0]
      314571640 blocks super 1.2 [2/1] [U_]

md126 : active raid1 sdb1[2]
      314571640 blocks super 1.2 [2/1] [_U]

md127 : active raid1 sda2[0] sdb2[1]
      1048000 blocks super 1.2 [2/2] [UU]

unused devices: <none>

It looks rather plain that sda1 and sdb1 are both seen as different arrays, though the other part is missing. And whenever I try to run a pvscan, it tells me

# pvscan
  Found duplicate PV fuQ8NF1x1aifPHtGffNEF1sKw6ZNwv29: using /dev/md126 not /dev/md125
  PV /dev/md126   VG unit05   lvm2 [300.00 GiB / 112.00 GiB free]
  Total: 1 [300.00 GiB] / in use: 1 [300.00 GiB] / in no VG: 0 [0   ]

The feeling I have got here, is that at some point in the past, one of my disks decided to go its own way and stopped being in sync.

It happens that the data found in the LVM (then the data from /dev/sdb) seems to be up-to-date.

What would be my course of action

To check if data is indeed out of sync
If so, get data synced back then change the disk
If not, should I just change the disk and wait mdadm to sync the new disk?

score 0 · Answer 1 · edited Mar 28 '19 at 18:49

While old, I ran into this issue and figured I would add my fix in case anyone in the future comes looking. The issue it appears is that LVM is finding the disks on the individual disks before finding them in the RAID array. Once LVM mounts the disk, the RAID Array Assembly fails. My fix was to simply tell LVM to not scan those devices. Each time I would boot, I would have a single Physical volume with an error about the size being incorrect (which made sense because I have RAID 10). On Centos 7 I added the following to /etc/lvm/lvm.conf in the devices section:

filter = [ "r|/dev/sda|","r|/dev/sdb|","r|/dev/sdc|","r|/dev/sdd|" ]

This tells LVM to not scan sda through sdd and ensures that the Raid Array can mount drives properly.

More documentation here: RedHat documentation: LVM filters

RAID1 disks "forked", now duplicate

1 Answers1