6

I have a raid5 array that has a check run on it once a month. It is configured so that the check runs for 6 hours from 01:00 and then stops. The following nights it will resume the check for another 6 hours until it has completed.

The issue I have is that sometimes when mdcheck attempts to stop the check running it hangs. Once this happens you can read from the array, but any attempt to write results in the process hanging.

The array state is as follows:

 md0 : active raid5 sdb1[4] sdc1[2] sdd1[5] sde1[1]
      8790398976 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [========>............]  check = 44.2% (1296999956/2930132992) finish=216065.8min speed=125K/sec
      bitmap: 0/6 pages [0KB], 262144KB chunk

The check = 44.2% (1296999956/2930132992) never advances or stops.

From looking at the /usr/share/mdadm/mdcheck script it appears that every 2 minutes, until the end time, it reads /sys/block/md0/md/sync_completed and saves the position in a file stored in the /var/lib/mdcheck/ directory. Looking in that directory the file is there and is dated 2 minutes before it was due to stop with the value of 2588437040. The current value of sync_completed is 2593999912 which indicates that everything was still working 2 minutes before it was due to stop.

Running lsof on the mdcheck process reveals the following:

 mdcheck 23887 root    1w   REG               0,21     4096     43388 /sys/devices/virtual/block/md0/md/sync_action

This appears to show that the mdcheck process is hanging when trying to stop the check after 6 hours. I confirmed this by running the following in a terminal: sudo echo idle >/sys/devices/virtual/block/md0/md/sync_action and this also hung.

The only way I have found to stop the check is to attempt a reboot, which also hangs, and then cycle the power.

How do I stop/unhang the mdcheck (and hence the array) without a reboot and how do I find out what the cause of the issue is (and resolve it)?

Additional information:

OS: OpenSUSE Leap 15.2

Kernel: 5.3.18-lp152.57-default

Running the consistency check without interruption succeeds.

Running extended self tests on the disks succeeds.

Replacing all the SATA cables has no effect.

Relevant dmesg entries:

[    5.565328] md/raid:md0: device sdb1 operational as raid disk 3
[    5.565330] md/raid:md0: device sdc1 operational as raid disk 2
[    5.565331] md/raid:md0: device sdd1 operational as raid disk 0
[    5.565332] md/raid:md0: device sde1 operational as raid disk 1
[    5.575520] md/raid:md0: raid level 5 active with 4 out of 4 devices, algorithm 2
[    5.640309] md0: detected capacity change from 0 to 9001368551424
[53004.024693] md: data-check of RAID array md0
[74605.665890] md: md0: data-check interrupted.
[139404.408605] md: data-check of RAID array md0
[146718.260616] md: md0: data-check done.
[1867115.595820] md: data-check of RAID array md0

Output of mdadm --detail /dev/md0:

           Version : 1.2
     Creation Time : Sat Nov  7 09:48:15 2020
        Raid Level : raid5
        Array Size : 8790398976 (8.19 TiB 9.00 TB)
     Used Dev Size : 2930132992 (2.73 TiB 3.00 TB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Feb  2 06:59:55 2021
             State : active, checking 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

      Check Status : 44% complete

              Name : neptune:0  (local to host neptune)
              UUID : 5dd490df:79bf70fa:b4b530bc:47b30419
            Events : 28109

    Number   Major   Minor   RaidDevice State
       5       8       49        0      active sync   /dev/sdd1
       1       8       65        1      active sync   /dev/sde1
       2       8       33        2      active sync   /dev/sdc1
       4       8       17        3      active sync   /dev/sdb1

Output of mdadm --examine /dev/sdb1 (all disks are essentially the same):

/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 5dd490df:79bf70fa:b4b530bc:47b30419
           Name : neptune:0  (local to host neptune)
  Creation Time : Sat Nov  7 09:48:15 2020
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 5860266895 sectors (2.73 TiB 3.00 TB)
     Array Size : 8790398976 KiB (8.19 TiB 9.00 TB)
  Used Dev Size : 5860265984 sectors (2.73 TiB 3.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=911 sectors
          State : clean
    Device UUID : a40bb655:70a88240:06dfad1d:f7fcbdca

Internal Bitmap : 8 sectors from superblock
    Update Time : Tue Feb  2 06:59:55 2021
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : 42b3d6 - correct
         Events : 28109

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
Paranoid
  • 286
  • 2
  • 6
  • What was in `dmesg` at the time that it hung? – Michael Hampton Feb 02 '21 at 12:58
  • @MichaelHampton There is only one dmesg entry after the last dm one shown above. It occurs just after dmcheck hangs and it is an nfs server failing to respond due to the array hanging. – Paranoid Feb 02 '21 at 13:49
  • Well that's odd. This is behavior I'd expect from a faulty disk, but you would usually see something in `dmesg`. It's worth running the extended self-tests on each disk though, just in case. – Michael Hampton Feb 02 '21 at 13:56
  • I did and all of the disks passed. I also changed the SATA cables just in case one one of then was faulty. – Paranoid Feb 02 '21 at 14:15
  • 1
    I've seen a very similar problem twice on different hardware and I'm currently thinking it's some yet unknown kernel bug / race condition. The best think you can do is to `echo s > /proc/sysrq-trigger`, wait a moment to sync everything in progress to disk, `echo u > /proc/sysrq-trigger` to remount everything possible as read-only and `echo b > /proc/sysrq-trigger` to boot the system without waiting for anything. You can bring down the services manually as much as possible without this but you cannot run anything that tries to touch the hanged mdraid. Do not ask `systemd` to reboot the system. – Mikko Rantalainen Sep 13 '21 at 13:21

1 Answers1

4

It's probably this bug:

If that is indeed your issue, then you can try this workaround (swap md1 for md0/md2/etc. first):

echo active | sudo tee /sys/block/md1/md/array_state

Luke Yeager
  • 141
  • 4