How do you run a script when a kernel oops occurs?

Question

I would like to run a custom script when a kernel oops occurs. Is this possible, and if so how?

What are you trying to archieve? You can probably play with kdump and somehow attach some other actions to it, but I think you may need to enable crash on oops kernel parameter which will make oops unrecoverable. — rvs, Apr 11 '16 at 21:02
Anything you do after a kernel oops is going to be unreliable. The only meaningful thing you could want a script to do after a kernel oops is collect information which could potentially help identifying the root cause of the oops. If all you want to collect is the `dmesg` output, then there are better ways to achieve that such as serial console or netconsole. — kasperd, Apr 11 '16 at 21:07
There are two things I think I might want to do: send an email notification, and automatically reboot. — Mark Raymond, Apr 12 '16 at 05:05

score 0 · Answer 1 · answered Apr 13 '16 at 05:04

I would answer that you could install rsyslog and run the script via the Shell Execute operator (^program-to-execute;template). However, it will probably not work, because the system will certainly be unresponsible after a kernel oops and will not run the custom script.

Because of that, I suggest you to run a script in another server when a kernel oops occurs. For example:

In the server that eventually produces kernel oops, forward messages to a "monitor" server using the netconsole module.

# /etc/modprobe.d/netconsole.conf
# This example assumes 10.0.0.1 as the "bad" server and 10.0.0.2 as the "monitor" server
options netconsole netconsole=30514@10.0.0.1/eth0,30514@10.0.0.2/01:23:45:67:89:AB
options netconsole oops_only=1


# /etc/modules-load.d/netconsole.conf
# Tells 'systemd-modules-load' to load 'netconsole' automatically at boot
netconsole

In the monitor server (the one which receives kernel oops), run the custom script via rsyslog.

# /etc/rsyslog.d/kernel-oops-handler.conf

module(load="imudp")

input(type="imudp" 
    port="30514"
    ruleset="KernelOopsRuleSet")

# This aims to supply the IP address of the "bad" server in command line
template(name="KernelOopsArgs"
    type="string"
    string="%fromhost-ip%")

ruleset(name="KernelOopsRuleSet") {
    # This assumes that the '--[ cut here ]--' string is a kernel oops evidence
    if ($msg contains "------------[ cut here ]------------") then {
        kern.crit ^/path/to/custom/script.sh;KernelOopsArgs
    }
}

The custom script may restart the machine via the out-of-band management interface (iDRAC on Dell servers):

#!/bin/bash
# /path/to/custom/script.sh
# A successful SSH to the host indicates the server is responsible
sleep 3
server="${1}"
if ! ssh -n -o ConnectTimeout=10 -o ControlPath=none "${server}" true; then
    # Let me suppose 10.100.0.1 is the iDRAC IP address of a server whose IP is 10.0.0.1
    idrac="`echo \"${server}\" | sed 's/^10\.0\./10.100./'`"
    # Trigger a forced reboot using 'ipmitool'
    ipmitool -H "${idrac}" -U root -P root chassis power reset
    # Notify administrators
    mail -s "Server '${server}' was restarted!" sysadmins@example.com < /dev/null
fi

How do you run a script when a kernel oops occurs?

1 Answers1