I would like to run a custom script when a kernel oops occurs. Is this possible, and if so how?
Asked
Active
Viewed 450 times
-4
-
1What are you trying to archieve? You can probably play with kdump and somehow attach some other actions to it, but I think you may need to enable crash on oops kernel parameter which will make oops unrecoverable. – rvs Apr 11 '16 at 21:02
-
1Anything you do after a kernel oops is going to be unreliable. The only meaningful thing you could want a script to do after a kernel oops is collect information which could potentially help identifying the root cause of the oops. If all you want to collect is the `dmesg` output, then there are better ways to achieve that such as serial console or netconsole. – kasperd Apr 11 '16 at 21:07
-
There are two things I think I might want to do: send an email notification, and automatically reboot. – Mark Raymond Apr 12 '16 at 05:05
1 Answers
0
I would answer that you could install rsyslog
and run the script via the Shell Execute operator (^program-to-execute;template
). However, it will probably not work, because the system will certainly be unresponsible after a kernel oops and will not run the custom script.
Because of that, I suggest you to run a script in another server when a kernel oops occurs. For example:
In the server that eventually produces kernel oops, forward messages to a "monitor" server using the netconsole module.
# /etc/modprobe.d/netconsole.conf # This example assumes 10.0.0.1 as the "bad" server and 10.0.0.2 as the "monitor" server options netconsole netconsole=30514@10.0.0.1/eth0,30514@10.0.0.2/01:23:45:67:89:AB options netconsole oops_only=1 # /etc/modules-load.d/netconsole.conf # Tells 'systemd-modules-load' to load 'netconsole' automatically at boot netconsole
In the monitor server (the one which receives kernel oops), run the custom script via
rsyslog
.# /etc/rsyslog.d/kernel-oops-handler.conf module(load="imudp") input(type="imudp" port="30514" ruleset="KernelOopsRuleSet") # This aims to supply the IP address of the "bad" server in command line template(name="KernelOopsArgs" type="string" string="%fromhost-ip%") ruleset(name="KernelOopsRuleSet") { # This assumes that the '--[ cut here ]--' string is a kernel oops evidence if ($msg contains "------------[ cut here ]------------") then { kern.crit ^/path/to/custom/script.sh;KernelOopsArgs } }
The custom script may restart the machine via the out-of-band management interface (iDRAC on Dell servers):
#!/bin/bash # /path/to/custom/script.sh # A successful SSH to the host indicates the server is responsible sleep 3 server="${1}" if ! ssh -n -o ConnectTimeout=10 -o ControlPath=none "${server}" true; then # Let me suppose 10.100.0.1 is the iDRAC IP address of a server whose IP is 10.0.0.1 idrac="`echo \"${server}\" | sed 's/^10\.0\./10.100./'`" # Trigger a forced reboot using 'ipmitool' ipmitool -H "${idrac}" -U root -P root chassis power reset # Notify administrators mail -s "Server '${server}' was restarted!" sysadmins@example.com < /dev/null fi

Anderson Medeiros Gomes
- 2,477
- 9
- 20