3

What will happen if we sleep in an interrupt handler on a SMP Machine,

I wrote a sample keyboard driver and added sleep on the interrupt handler

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/interrupt.h>
#include <linux/delay.h>
#include <linux/sched/signal.h>

MODULE_LICENSE("GPL");
static int irq = 1,  dev = 0xaa, counter = 0;

static irqreturn_t keyboard_handler(int irq, void *dev)
{
        pr_info("Keyboard Counter:%d\n", counter++);
        msleep(1000);
        return IRQ_NONE;
}
/* registering irq */
static int test_interrupt_init(void)
{
        pr_info("%s: In init\n", __func__);
        return request_irq(irq, keyboard_handler, IRQF_SHARED,"my_keyboard_handler", &dev);
}

static void test_interrupt_exit(void)
{
        pr_info("%s: In exit\n", __func__);
        synchronize_irq(irq); /* synchronize interrupt */
        free_irq(irq, &dev);
}

module_init(test_interrupt_init);
module_exit(test_interrupt_exit);

The system ran for few minutes and then panic. Why can't the system work with one processor disabled, as the timer interrupts will be fired on the other CPU and can schedule processes.

Back Trace captured using kgdb setup:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1254]
0x0000000000000000 in irq_stack_union ()

(gdb) bt
#0  0x0000000000000000 in irq_stack_union ()
#1  0xffffffff810ad8e4 in ttwu_activate (en_flags=<optimized out>, p=<optimized out>, rq=<optimized out>)
    at kernel/sched/core.c:1638
#2  ttwu_do_activate (rq=0xffff888237422c40, p=0xffffffff82213780 <init_task>, wake_flags=9, 
    rf=0xffffc900022b3f10) at kernel/sched/core.c:1697
#3  0xffffffff810aec00 in sched_ttwu_pending () at kernel/sched/core.c:1740
#4  0xffffffff810aedcd in scheduler_ipi () at kernel/sched/core.c:1771
#5  0xffffffff81a01aef in reschedule_interrupt () at arch/x86/entry/entry_64.S:888
#6  0xffffc900022b3f58 in ?? ()
#7  0xffffffff81a01aea in reschedule_interrupt () at arch/x86/entry/entry_64.S:888
#8  0x0000000000000002 in irq_stack_union ()
#9  0x00007fcec3421b40 in ?? ()
#10 0x0000000000000006 in irq_stack_union ()
#11 0x00007fceb00008c0 in ?? ()
#12 0x0000000000000002 in irq_stack_union ()
#13 0x00000000020bd380 in ?? ()
#14 0x0012c8d2cc413914 in ?? ()
Cannot access memory at address 0x5000
red0ct
  • 4,840
  • 3
  • 17
  • 44
md.jamal
  • 4,067
  • 8
  • 45
  • 108
  • Consider what happens when a regular user-mode program calls `sleep()` -- the `sleep()` system call tells the OS that the current thread wants to sleep, and the OS's scheduler responds by moving the current thread out of the ready-to-execute-threads list and onto the sleeping-threads-list, and sets up a timer to go off at the appropriate time so that the scheduler can wake the thread up again by reversing the aforementioned steps. Now consider what happens when `sleep()` is called from within an interrupt handler -- what thread should the scheduler migrate to another list? – Jeremy Friesner Jan 06 '19 at 03:29
  • The answer to this question might be more precise if you attach kernel panic log – Alex Hoppus Jan 06 '19 at 17:38
  • Updated post with code and back trace when kernel received SIGSEGV – md.jamal Jan 07 '19 at 05:37
  • You do a recursive IRQ and make your stack full. On top of this you have no way out from IRQ handler. And tragic mistake is to do sleep in atomic contexts. – 0andriy Jan 11 '19 at 22:12

1 Answers1

0

The keyboard interrupt can happen at any time - including during a kernel call. Normally, that’s OK: the interrupt happens, the driver does its thing, the interrupt handler returns, and the kernel continues.

But if you sleep() in the interrupt handler, the kernel is in an intermediate state. Other processors can’t execute kernel calls, because it’s already busy. Each will be forced to pause waiting for the kernel - which isn’t coming back. No wonder it panics!

John Burger
  • 3,662
  • 1
  • 13
  • 23
  • So, to make things clear, there is only one kernel running on multiple processors – md.jamal Jan 06 '19 at 05:55
  • Yes, that is implied when you say “SMP’. Asymmetric MultiProcessing (AMP) is when there are multiple kernels. But there are critical times in a kernel’s function when it cannot be re-entered. To prevent this it usually claims a lock - or even disables interrupts momentarily - but if a lock is held when your keyboard interrupt happens, then other processors can’t run kernel code any more – John Burger Jan 06 '19 at 05:59