4

The short version: I have a failed RAID 5 array which has a bunch of processes hung waiting on I/O operations on it; how can I recover from this?

The long version: Yesterday I noticed Samba access was being very sporadic; accessing the server's shares from Windows would randomly lock up explorer completely after clicking on one or two directories. I assumed it was Windows being a pain and left it. Today the problem is the same, so I did a little digging; the first thing I noticed was that running ps aux | grep smbd gives a lot of lines like this:

ben        969  0.0  0.2  96088  4128 ?        D    18:21   0:00 smbd -F
root      1708  0.0  0.2  93468  4748 ?        Ss   18:44   0:00 smbd -F
root      1711  0.0  0.0  93468  1364 ?        S    18:44   0:00 smbd -F
ben       3148  0.0  0.2  96052  4160 ?        D    Mar07   0:00 smbd -F
...

There are a lot of processes stuck in the "D" state. Running ps aux | grep " D" shows up some other processes including my nightly backup script, all of which need to access the volume mounted on my RAID array at some point. After some googling, I found that it might be down to the RAID array failing, so I checked /proc/mdstat, which shows this:

ben@jack:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdb1[3](F) sdc1[1] sdd1[2]
      2930271872 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU]

unused devices: <none>

And running mdadm --detail /dev/md0 gives this:

ben@jack:~$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Sat Oct 31 20:53:10 2009
     Raid Level : raid5
     Array Size : 2930271872 (2794.53 GiB 3000.60 GB)
  Used Dev Size : 1465135936 (1397.26 GiB 1500.30 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Mar  7 03:06:35 2011
          State : active, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : f114711a:c770de54:c8276759:b34deaa0
         Events : 0.208245

    Number   Major   Minor   RaidDevice State
       3       8       17        0      faulty spare rebuilding   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1

I believe this says that sdb1 has failed, and so the array is running with two drives out of three 'up'. Some advice I found said to check /var/log/messages for notices of failures, and sure enough there are plenty:

ben@jack:~$ grep sdb /var/log/messages

...

Mar  7 03:06:35 jack kernel: [4525155.384937] md/raid:md0: read error NOT corrected!! (sector 400644912 on sdb1).
Mar  7 03:06:35 jack kernel: [4525155.389686] md/raid:md0: read error not correctable (sector 400644920 on sdb1).
Mar  7 03:06:35 jack kernel: [4525155.389686] md/raid:md0: read error not correctable (sector 400644928 on sdb1).
Mar  7 03:06:35 jack kernel: [4525155.389688] md/raid:md0: read error not correctable (sector 400644936 on sdb1).
Mar  7 03:06:56 jack kernel: [4525176.231603] sd 0:0:1:0: [sdb] Unhandled sense code
Mar  7 03:06:56 jack kernel: [4525176.231605] sd 0:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar  7 03:06:56 jack kernel: [4525176.231608] sd 0:0:1:0: [sdb] Sense Key : Medium Error [current] [descriptor]
Mar  7 03:06:56 jack kernel: [4525176.231623] sd 0:0:1:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed
Mar  7 03:06:56 jack kernel: [4525176.231627] sd 0:0:1:0: [sdb] CDB: Read(10): 28 00 17 e1 5f bf 00 01 00 00

To me it is clear that device sdb has failed, and I need to stop the array, shutdown, replace it, reboot, then repair the array, bring it back up and mount the filesystem. I cannot hot-swap a replacement drive in, and don't want to leave the array running in a degraded state. I believe I am supposed to unmount the filesystem before stopping the array, but that is failing, and that is where I'm stuck now:

ben@jack:~$ sudo umount /storage
umount: /storage: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

It is indeed busy; there are some 30 or 40 processes waiting on I/O.

What should I do? Should I kill all these processes and try again? Is that a wise move when they are 'uninterruptable'? What would happen if I tried to reboot?

Please let me know what you think I should do. And please ask if you need any extra information to diagnose the problem or to help!

Ben Hymers
  • 703
  • 2
  • 8
  • 12

3 Answers3

4

I don't think you need to stop the array. Simply fail /dev/sdb, remove it (I suppose it's a pluggable hard drive), and plug a new drive that you'll declare as hot spare.

wazoox
  • 6,918
  • 4
  • 31
  • 63
  • 1
    Sadly they're not pluggable, and even if they were I don't have a spare ready, so I'd rather stop the array than run it degraded until I get a replacement, despite the extensive backups I have. I'll update my question to mention this; my apologies for leaving it out in the first place! – Ben Hymers Mar 08 '11 at 22:59
  • 2
    You should still be able to fail /dev/sdb and keep the array running in the degraded state, without having to unmount the filsystem – Mike Mar 08 '11 at 23:13
  • 1
    Note that /dev/sdb is already labelled as 'failed', in the output from `/proc/mdstat`. I just edited the question to include `mdadm --detail /dev/md0` output to make this more clear. The array is already running degraded but I wish to take it down since it will likely be days before I can replace the failed drive. Currently though I can't shut down :( – Ben Hymers Mar 08 '11 at 23:47
  • 1
    I'll accept this answer as it's the most relevant to the original question, and would have worked if I had hotswapping enabled! Thanks. – Ben Hymers Mar 14 '11 at 11:49
3

You can't kill a process that is attempting i/o. What you'll have to do is use the lazy option of the umount command to remove the filesystem from the filesystem namespace even though files on it are still open. For more information on this (and other "quirks" of this aspect of linux's design), see Neil Brown.

umount -l /storage
sciurus
  • 12,678
  • 2
  • 31
  • 49
  • That seems like what I'm after, thanks! I'll try it out right now. – Ben Hymers Mar 08 '11 at 23:01
  • Well, the filesystem is now unmounted, but `ps` still reports many processes still in the "D" state. I still can't kill them either. What can I do to get rid of them? – Ben Hymers Mar 08 '11 at 23:04
  • It's impossible without rebooting. It shouldn't matter for your RAID recovery process. – sciurus Mar 08 '11 at 23:52
  • In that case, it now boils down to one thing: How do I reboot? :) `ps` shows that `halt` is also in the uninterruptable state after a `init 0` or `shutdown`! – Ben Hymers Mar 09 '11 at 00:05
  • Sorry, really this is another question; I'll start another question rather than discussing this in the comments. – Ben Hymers Mar 09 '11 at 00:29
  • New question is here: http://serverfault.com/questions/244978/halt-goes-into-uninterruptable-state-while-shutting-down-how-can-i-shut-down – Ben Hymers Mar 09 '11 at 00:38
1

You could also stop the samba process which would stop the writes to the disk and allow for the current writes to finish rather than unmounting the filesystem which is being written to.

Mike
  • 802
  • 4
  • 5
  • I tried `sudo service smbd stop`, but the smbd processes are still running, and subsequent stop commands give me `stop: Unknown instance` which I assume means it thinks it's already stopped. – Ben Hymers Mar 08 '11 at 22:57
  • `ps -ef |grep smbd` should tell you which open threads are still active. – Mike Mar 08 '11 at 23:12
  • Yes, that reports some 30 or 40 smbd processes, similar to the `ps aux` output in the question. It's not just smbd processes that are stalled though, there are some other things like `ls`, `halt`, `python`, `/usr/bin/updatedb.mlocate` and `[jbd2/md0-8]`. Killing the samba processes is probably harmless but I don't know about these others. That, and killing them seems impossible at the moment. – Ben Hymers Mar 08 '11 at 23:31