6

I had some huge file transfers operating on an NFS mount. The server on which the mount point resided was carelessly rebooted, and now the server from which these large transfers were initiated seems to be bogged down by them.

If I run top, I see the following:

enter image description here

The first thing I tried was to run kill with each the -1 -2 -9 and -15 flags, and each of the process ids shown above in turn. This allowed me to proceed, but didn't kill the processes. The next thing I attempted was to reboot the server, but neither reboot nor shutdown -r now worked. When I ran shutdown -r now the standard broadcast message was sent out, but the sever did not reboot. I confirmed this by looking at the server uptime, which was 25 days.

So now I'm a little stuck. I'm running these commands as root.

EDIT: Here's another interesting tidbit:

enter image description here

In top, I don't see that any other processes are using more than a fraction of a percent of memory or more than 5% of CPU.

EDIT 2: output of /var/log/messages

enter image description here

DeeDee
  • 333
  • 2
  • 7
  • 16
  • If the parent process isn't 1, try killing that. – Matthew Ife Jun 22 '12 at 19:39
  • @Mlfe thanks! I just tried kill 1 per your advice but it didn't kill the init process. – DeeDee Jun 22 '12 at 19:43
  • I also unmounted the problem share, but that didn't allow me to kill the jobs – DeeDee Jun 22 '12 at 19:44
  • 1
    I siad dont kill 1 :). You could try a 'drastic' reboot. "sync; halt -f -d -n --reboot". This will forcibly reboot the host without switching runlevels. This might cause the host to become unresponsive and need physical interruption. Down to you to take that risk.. – Matthew Ife Jun 22 '12 at 19:50
  • @Mlfe Thanks for the clarification! I'm a little frazzled, so I misunderstood it at first. If I don't get any additional answers within the next five minutes I'll give your "drastic" measure a try. The server is a PitA to physically access, so I'm trying to avoid that if possible. – DeeDee Jun 22 '12 at 19:57
  • @Mlfe I enacted the "drastic measure" and it didn't do *anything*. o_O What sort of gremlin has found his way into my server?! – DeeDee Jun 22 '12 at 20:09

2 Answers2

5

OK, time for something even more drastic!

echo 1 >/proc/sys/kernel/panic

This instructs the kernel that when you kernel panic, to reboot the host in 1 second.

echo c >/proc/sysrq-trigger

This forces the kernel to panic. So hopefully you'll end up rebooting the host.

Matthew Ife
  • 23,357
  • 3
  • 55
  • 72
  • OK, that caused it to close the SSH connection. That's promising. Now for the long silence.... :) Man I'm learning a lot today.... – DeeDee Jun 22 '12 at 20:24
  • 1
    @Mlfe So it looks like some of my commands were getting through, and some weren't. I can't ping or SSH into the server, so it looks like I need to track down its physical location. Regardless, I bet those processes are finally killed, which is what I care about. Some people strive to be sysadmins, some have sysadmin duties thrust upon them :) – DeeDee Jun 22 '12 at 20:32
  • @DeeDee being thrown into the lion's den with a pocket full of meat is the best way to learn. – Banjer Jun 22 '12 at 20:34
  • @Banjer Well, my name is "Daniel" XD – DeeDee Jun 22 '12 at 20:45
  • @Banjer Aaaaaand we're back. Many thanks, guys. You provided a wealth of information to which I'm sure I'll be returning in the future. – DeeDee Jun 22 '12 at 22:09
  • 1
    This is overwork. `echo b >| /proc/sysrq-trigger` is enough. It behaves almost like a hardware reset button. – Totor May 24 '13 at 13:46
1

You've started the shutdown process, so your best bet is to get that thing rebooted. If its a physical machine, then can you physically power off the server or via a service processor?

If not, and you think its these specific processes hanging up, then try and Kill all processes named mv and gzip with this:

killall mv

killall gzip

As a general disclaimer, Be careful with that killall command, as you don't want to accidentally kill some system process, so just be aware of what you're killing.

Also, see whats mounted with df -h and try umounting them. I've seen my Linux systems hang on shutdown when they won't let go of an nfs mount. I usually have to "lazy" unmount them with

umount -l /path/of/mount/point

EDIT 1

Other ways to gracefully reboot:

Does your system respond to Ctrl+Alt+Del?

If not, try the magic SysRQ key combo: Alt+SysRq+R+E+I+S+U+B. Sysrq=PrintScreen button. While holding down alt and sysrq keys, you type the REISUB key combo one after the other in order. It basically kills off all processes first, does some other cleanup, and reboots. This only works if magic sysrq is enabled in your kernel. FYI:

R: Switch the keyboard from raw mode to XLATE mode
E: Send the SIGTERM signal to all processes except init
I: Send the SIGKILL signal to all processes except init
S: Sync all mounted filesystems
U: Remount all mounted filesystems in read-only mode
B: Immediately reboot the system, without unmounting partitions or syncing
Banjer
  • 3,974
  • 12
  • 41
  • 47
  • Thanks! I tried the killall commands, but the little buggers are still kicking. When I ran the lazy umount, it said that the path in question is already unmounted. Good to know that my initial umount worked though! Now I know how doctors must feel when they encounter MRSA :) – DeeDee Jun 22 '12 at 19:53
  • 1
    Anything else listed in `df -h`? Any interesting logs in `/var/log/messages`? – Banjer Jun 22 '12 at 19:56
  • df -h doesn't have anything particularly interesting. I edited the post in include the recent relevant content of /var/log/messages – DeeDee Jun 22 '12 at 20:03
  • see other reboot suggestions in my answer above. – Banjer Jun 22 '12 at 20:15
  • that SysRQ commbo...how...i don't even have enough hands for that! – acolyte Jun 22 '12 at 20:20
  • Thanks @Banjer! I'm on an SSH connection using Putty, so these solutions didn't work. – DeeDee Jun 22 '12 at 20:22
  • @acolyte haha, you actually only hold down Alt+SysRq with one hand, then run through the REISUB key combo one at-a-time with the other hand. DeeDee: see Mlfe's answer regarding forcing kernel panic from command line. – Banjer Jun 22 '12 at 20:25
  • @Banjer ohh. i was going to say... – acolyte Jun 22 '12 at 20:29