-4

I would like to run a custom script when a kernel oops occurs. Is this possible, and if so how?

Mark Raymond
  • 199
  • 1
  • 2
  • 8
  • 1
    What are you trying to archieve? You can probably play with kdump and somehow attach some other actions to it, but I think you may need to enable crash on oops kernel parameter which will make oops unrecoverable. – rvs Apr 11 '16 at 21:02
  • 1
    Anything you do after a kernel oops is going to be unreliable. The only meaningful thing you could want a script to do after a kernel oops is collect information which could potentially help identifying the root cause of the oops. If all you want to collect is the `dmesg` output, then there are better ways to achieve that such as serial console or netconsole. – kasperd Apr 11 '16 at 21:07
  • There are two things I think I might want to do: send an email notification, and automatically reboot. – Mark Raymond Apr 12 '16 at 05:05

1 Answers1

0

I would answer that you could install rsyslog and run the script via the Shell Execute operator (^program-to-execute;template). However, it will probably not work, because the system will certainly be unresponsible after a kernel oops and will not run the custom script.

Because of that, I suggest you to run a script in another server when a kernel oops occurs. For example:

  1. In the server that eventually produces kernel oops, forward messages to a "monitor" server using the netconsole module.

    # /etc/modprobe.d/netconsole.conf
    # This example assumes 10.0.0.1 as the "bad" server and 10.0.0.2 as the "monitor" server
    options netconsole netconsole=30514@10.0.0.1/eth0,30514@10.0.0.2/01:23:45:67:89:AB
    options netconsole oops_only=1
    
    
    # /etc/modules-load.d/netconsole.conf
    # Tells 'systemd-modules-load' to load 'netconsole' automatically at boot
    netconsole
    
  2. In the monitor server (the one which receives kernel oops), run the custom script via rsyslog.

    # /etc/rsyslog.d/kernel-oops-handler.conf
    
    module(load="imudp")
    
    input(type="imudp" 
        port="30514"
        ruleset="KernelOopsRuleSet")
    
    # This aims to supply the IP address of the "bad" server in command line
    template(name="KernelOopsArgs"
        type="string"
        string="%fromhost-ip%")
    
    ruleset(name="KernelOopsRuleSet") {
        # This assumes that the '--[ cut here ]--' string is a kernel oops evidence
        if ($msg contains "------------[ cut here ]------------") then {
            kern.crit ^/path/to/custom/script.sh;KernelOopsArgs
        }
    }
    
  3. The custom script may restart the machine via the out-of-band management interface (iDRAC on Dell servers):

    #!/bin/bash
    # /path/to/custom/script.sh
    # A successful SSH to the host indicates the server is responsible
    sleep 3
    server="${1}"
    if ! ssh -n -o ConnectTimeout=10 -o ControlPath=none "${server}" true; then
        # Let me suppose 10.100.0.1 is the iDRAC IP address of a server whose IP is 10.0.0.1
        idrac="`echo \"${server}\" | sed 's/^10\.0\./10.100./'`"
        # Trigger a forced reboot using 'ipmitool'
        ipmitool -H "${idrac}" -U root -P root chassis power reset
        # Notify administrators
        mail -s "Server '${server}' was restarted!" sysadmins@example.com < /dev/null
    fi