1

One of my nfs servers disappeared this morning from monitoring. I checked it out and the console was hung and non-responsive, and it was apparently crashed.

I power-cycled and checked out the syslog, and it very much appears to have just crashed with no indication of why.

Is there any kernel or debug settings that I can apply, to try and trap any future re-occurence or this problem. (or any recommendation on how to proceed)

Tom
  • 11,176
  • 5
  • 41
  • 63

2 Answers2

3

If it completely hard-crashed, nothing in the logs, I'd strongly suspect it was hardware related. I'd reseat memory, check that the fans are running properly to cool the server, and if it's a server-grade system, use the diagnostics to check the equipment out (I know that Dell servers usually have a series of tests that can be run, but it depends on the model whether it's in BIOS or a startup partition or a bootable CD)

Rarely, rarely, rarely have I had Linux crash completely in an unresponsive manner without a kernel dump or something in the logs. I have had systems go nuts due to a dying controller, memory creeping, or something else hardware-related, which can easily do what you're describing.

Bart Silverstrim
  • 31,172
  • 9
  • 67
  • 87
2

Check your hardware, as Bart said. Also, sometimes an unresponsive machine may be on that state because of a stupidly large load. I saw some mailservers do that. Check your network, NFS can crash badly if the networks goes away when it is doing something.

If you ever need to do that to a machine again, remember the Magic Sysrq key and the Raising Elephants Is So Utterly Boring phrase. The ALT+SysRQ+ can do wonders on a linux box that is to all aspects dead. The prase is to remember the commands to use with ALT+SysRQ:

R: take control of the keyboard
E: sends SIGTERM to all processes
I: sends SIGKILL to all processes
S: Sync (flush caches to disk, very important)
U: remount all FS read only
B: reboot!
coredump
  • 12,713
  • 2
  • 36
  • 56
  • thats really interesting, I didn't know about the sysrq key. I am looking forward to something crashing again so I can try it ;-) – Tom Jul 20 '10 at 09:14