3

I have some unexpected reboot on a embedded device. I am currently able to detect a hardware watchdog issue thanks to an ioctl call. Now I would like be able to detect if a kernel panic was the reason for a reboot. I find some articles concerning crashkernel and crashdump but I was not able to make it work properly. And I dont want to store the kernel panic log. Just be able to know if kernel panic happens.

My current idea was to write in a reserved space on mmc. I am currently using a reserved space to handle a double distribution system. It is a good idea ? Is it possible to write in mmc during a kernel panic ? I am not sure but its seems that I can use kind of kernel panic hook to run routine on this event.

There is no standard way to be able to check that kernel panic happened on boot ?

Thomas Weller
  • 55,411
  • 20
  • 125
  • 222
ArthurLambert
  • 749
  • 1
  • 7
  • 30
  • Well, it is not a good idea to work with File System when the Kenel is crashed. So, I would suggest you to avoid accessing eMMC as it would contain your rootfs. I am not sure if there is a kernel panic hook available. You can edit `panic.c` to toggle some LED(if there is one) or send some command back to UART or display some data on LCD; anything but try to avoid file system access during kernel crash. – Gaurav Pathak Dec 11 '17 at 11:24
  • I cannot use led or uart. The embedded device is not physically accessible. I am currently trying to use the unused register of my rtc to save the kernel panic event to be able to detect it on next reboot. Not sure that this is the best pratice to handle my usecase. I am using atomic_notifier_chain_register api to register a hook on kernel panic. – ArthurLambert Dec 11 '17 at 13:12
  • 1
    That sounds like a much better idea than trying to mess with the file system in a notifier. – tofro Dec 11 '17 at 16:26
  • 1
    You need to google how to use `pstore` and `ramoops`. – 0andriy Dec 13 '17 at 17:17

2 Answers2

1

I was able to detect and debug kernel panic thanks to the comment from @0andriy How to detect a kernel panic after reboot

Enable ramoops in kernel defconfig :

+CONFIG_PSTORE=y
+CONFIG_PSTORE_ZLIB_COMPRESS=y
+CONFIG_PSTORE_CONSOLE=y
+CONFIG_PSTORE_RAM=y

Add code in your kernel board init to declare the ramoops memory space, you can also use the device tree or even use a parameter in kernel procline This is an example using the code method, in my usecase it was in arch/arm/mach-imx/mach-imx6ul.c

--- a/arch/arm/mach-imx/mach-imx6ul.c
+++ b/arch/arm/mach-imx/mach-imx6ul.c
@@ -21,6 +21,24 @@
 #include "cpuidle.h"
 #include "hardware.h"

+#include <linux/pstore_ram.h>
+#include <linux/memblock.h>
+
+static struct ramoops_platform_data ramoops_data = {
+       .mem_address = 0xXXXXXXXX, // Depending of the hardware
+       .mem_size = 0x00005000, // 5 Mb
+       .record_size = 0x00002000, // 1 Mb
+       .dump_oops = 1,
+};
+
+static struct platform_device ramoops_dev = {
+       .name = "ramoops",
+       .dev = {
+               .platform_data = &ramoops_data,
+       },
+};
+
+
 static void __init imx6ul_enet_clk_init(void)
 {
        struct regmap *gpr;
@@ -170,6 +188,14 @@ static inline void imx6ul_enet_init(void)
 static void __init imx6ul_init_machine(void)
 {
        struct device *parent;
+       int ret;
+
+       ret = platform_device_register(&ramoops_dev);
+       if (ret) {
+               printk(KERN_ERR "unable to register platform device\n");
+               return;
+       }
+       memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size);

        parent = imx_soc_device_init();
        if (parent == NULL)

Then on boot I just have to check the content of ramoops to check if there is some kernel panic log available. I can mount the ramoops memory space with :

mount -t pstore -o kmsg_bytes=1000 - /sys/fs/pstore
ArthurLambert
  • 749
  • 1
  • 7
  • 30
0

Here's how Windows handles it:

  • do not use drivers any more
  • write to disk using BIOS routines (or something low level as this)
  • write the kernel dump into the page file (the only known place which is contiguous and known that we can write to without damaging anything)
  • on next boot, check if the page file contains a crash dump signature

You might be able to apply this concept to Linux, e.g. write to the swap partition and check the contents of the swap partition at next startup.

Thomas Weller
  • 55,411
  • 20
  • 125
  • 222