1

I am working on a kernel module where I need to be "aware" that a given process has crashed.

Right now my approach is to set up a periodic timer interrupt in the kernel module; on every timer interrupt, I check the task_struct.state and task_struct.exitstate values for that process.

I am wondering if there's a way to set up an interrupt in the kernel module that would go off when the process terminates, or, when the process receives a given signal (e.g., SIGINT or SIGHUP).

Thanks!

EDIT: A catch here is that I can't modify the user application. Or at least, it would be a much tougher sell to the customer if I place additional requirements/constraints on s/w from another vendor...

Yaron Shragai
  • 137
  • 11

1 Answers1

1

You could have your module create a character device node and then open that node from your userspace process. It's only about a dozen lines of boilerplate to register a simple cdev in your module. Your cdev's open method will get called when the process opens the device node and the release method will be called when the device node is closed. If a process exits, either intentionally or because of a signal, all open file descriptors are closed by the kernel. So you can be certain that release will be called. This avoids any need to poll the process status and you can avoid modifying any kernel code outside of your module.

You could also setup a watchdog style system, where your process must write one byte to the device every so often. Have the write method of the cdev reset a timer. If too much time passes without a write and the timer expires, it is assumed the process has somehow failed, even if it hasn't crashed and terminated. For instance a programming bug that allowed for a mutex deadlock or placed the process into an infinite loop.

There is a point in the kernel code where signals are delivered to user processes. You could patch that, check the process name, and signal a condition variable if it matches. This would just catch signals, not intentional process exits. IMHO, this is much uglier and you'll need to deal with maintaining a kernel patch. But it's not that hard, there's a single point, I don't recall what function, sorry, where one can insert the necessary code and it will catch all signals.

TrentP
  • 4,240
  • 24
  • 35
  • Thanks! Here's a catch, that I should have mentioned: I can't modify the user application. Or at least, it would be a much tougher sell to the customer if I place additional requirements/constraints on s/w from another vendor... Perhaps the patch method is something to look at, but I suppose the complexity trade-off would only be worth it if the performance degradation due to the periodic timer interrupts hits the acceptability threshold... – Yaron Shragai Apr 11 '17 at 20:28
  • 1
    Have a wrapper that starts the application and signals your module? The parent of a process (i.e. the wrapper) will get notified if the child (the application process) crashes or exits via `waitpid()` et al. You could even have the wrapper open the driver's device node, then exec() the app, without closing it the device node. The child will inherit the open file descriptor and keep it open until it closes it or terminates. The wrapper doesn't need to exist past starting the app. – TrentP Apr 11 '17 at 21:04