3

I'm new to kernel programming and I couldn't find enough information to know why this happens. Basically I'm trying to replace the page fault handler in the kernel's IDT with something simple that calls the original handler in the end. I just want this function to print a notification that it is called, and calling printk() inside it always results in a kernel panic. It runs fine otherwise.

#include <asm/desc.h>
#include <linux/mm.h>
#include <asm/traps.h>
#include <linux/types.h>
#include <linux/errno.h>
#include <linux/sched.h>
#include <asm/uaccess.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <asm/desc_defs.h>
#include <linux/moduleparam.h>

#define PAGEFAULT_INDEX 14


// Old and new IDT registers
static struct desc_ptr old_idt_reg, new_idt_reg;

static __attribute__((__used__)) unsigned long old_pagefault_pointer, new_page;



// The function that replaces the original handler
asmlinkage void isr_pagefault(void);
asm("    .text");
asm("    .type isr_pagefault,@function");
asm("isr_pagefault:");
asm("    callq print_something");
asm("    jmp *old_pagefault_pointer");



void print_something(void) {
    // This printk causes the kernel to crash!
    printk(KERN_ALERT "Page fault handler called\n");

    return;

}

void my_idt_load(void *ptr) {
    printk(KERN_ALERT "Loading on a new processor...\n");
    load_idt((struct desc_ptr*)ptr);

    return;
}



int module_begin(void) {

    gate_desc *old_idt_addr, *new_idt_addr;
    unsigned long idt_length;

    store_idt(&old_idt_reg);

    old_idt_addr = (gate_desc*)old_idt_reg.address;
    idt_length   = old_idt_reg.size;

    // Get the pagefault handler pointer from the IDT's pagefault entry
    old_pagefault_pointer = 0
        | ((unsigned long)(old_idt_addr[PAGEFAULT_INDEX].offset_high)   << 32   )
        | ((unsigned long)(old_idt_addr[PAGEFAULT_INDEX].offset_middle) << 16   )
        | ((unsigned long)(old_idt_addr[PAGEFAULT_INDEX].offset_low)            );

    printk(KERN_ALERT "Saved pointer to old pagefault handler: %p\n", (void*)old_pagefault_pointer);

    // Allocate a new page for the new IDT
    new_page = __get_free_page(GFP_KERNEL);
    if (!new_page)
        return -1;

    // Copy the original IDT to the new page
    memcpy((void*)new_page, old_idt_addr, idt_length);

    // Set up the new IDT
    new_idt_reg.address = new_idt_addr = new_page;
    new_idt_reg.size = idt_length;
    pack_gate(
        &new_idt_addr[PAGEFAULT_INDEX],
        GATE_INTERRUPT,
        (unsigned long)isr_pagefault, // The interrupt written in assembly at the start of the code
        0, 0, __KERNEL_CS
    );

    // Load the new table
    load_idt(&new_idt_reg);
    smp_call_function(my_idt_load, (void*)&new_idt_reg, 1); // Call load_idt on the rest of the cores

    printk(KERN_ALERT "New IDT loaded\n\n");

    return 0;

    }

void module_end(void) {

    printk( KERN_ALERT "Exit handler called now. Reverting changes and exiting...\n\n");

    load_idt(&old_idt_reg);
    smp_call_function(my_idt_load, (void*)&old_idt_reg, 1);

    if (new_page)
        free_page(new_page);

}

module_init(module_begin);
module_exit(module_end);

Many thanks to anyone who can tell me what I'm doing wrong here.

Bayou
  • 3,293
  • 1
  • 9
  • 22
Awais Chishti
  • 395
  • 2
  • 19
  • I thought printk could be modifying some of the registers so I tried saving all of the registers on the stack before making the call. The results were the same. – Awais Chishti Jul 10 '17 at 13:14
  • 1
    *"it always results in a kernel panic"* -- Good, you've posted your code, but where's the crash dump and backtrace? – sawdust Jul 10 '17 at 18:58
  • 2
    Just don't use *printk()* there. It looks not suitable debugging mechanism in your case. – 0andriy Jul 11 '17 at 23:38
  • @0andriy Probably not suitable, but as far as I've read about using `printk()` in interrupt handlers, the code shouldn't crash unless there's another problem. – Awais Chishti Jul 12 '17 at 01:11

2 Answers2

3

Sorry for resurrecting a dead post, but just for posterity:

I've run into similar issues when hooking IDT entries; one possibility is insufficient stack space. In 64-bit mode, when a trap or fault handler is called, the CPU determines a new stack pointer based on both the "interrupt stack table" (IST) field – bits 32 to 34 – of the corresponding interrupt descriptor, and the processor core's Task State Segment (TSS). From Volume 3A, section 6.14.5 of the Intel Software Developer's Manual:

In IA-32e mode, a new interrupt stack table (IST) mechanism is available as an alternative to the modified legacy stack-switching mechanism described above. This mechanism unconditionally switches stacks when it is enabled. It can be enabled on an individual interrupt-vector basis using a field in the IDT entry. This means that some interrupt vectors can use the modified legacy mechanism and others can use the IST mechanism.

The IST mechanism is only available in IA-32e mode. It is part of the 64-bit mode TSS. The motivation for the IST mechanism is to provide a method for specific interrupts (such as NMI, double-fault, and machine-check) to always execute on a known good stack. In legacy mode, interrupts can use the task-switch mechanism to set up a known good stack by accessing the interrupt service routine through a task gate located in the IDT. However, the legacy task-switch mechanism is not supported in IA-32e mode.

The IST mechanism provides up to seven IST pointers in the TSS. The pointers are referenced by an interrupt-gate descriptor in the interrupt-descriptor table (IDT); see Figure 6-8. The gate descriptor contains a 3-bit IST index field that provides an offset into the IST section of the TSS. Using the IST mechanism, the processor loads the value pointed by an IST pointer into the RSP.

... If the IST index is zero, the modified legacy stack-switching mechanism described above is used.

The "modified legacy stack-switching" mechanism is described in section 6.14.2 of the same chapter, and most importantly just loads the RSP0 entry of the TSS as the new stack pointer. Here is the figure that describes the TSS:

enter image description here

So, to summarize, if the IST field of the interrupt descriptor is 0, then the RSP0 entry of the TSS will be loaded as the new stack pointer, and if the IST field is non-zero, then the entry of the TSS indicated by it will be loaded as the new stack pointer. In x64 linux the IST field is 0 for page faults, so rsp is switched to the RSP0 entry of the TSS whenever a page fault occurs. Unfortunately, the stack space allocated here is rather small; playing around with a kernel debugger revealed that linux allocates only 512 bytes for this stack, and my suspicion is that printk perhaps requires greater stack space.

One possible solution to this is to, in the beginning of your page fault hook, manually switch the stack pointer to the RSP1 entry of the TSS, which should contain the current kernel stack and hence have ample room for printk. This is a very hacky and inelegant solution, but in my experience it does the trick. (To find the address of the TSS you should use str to get the Task Register (tr) and then get the base address from the corresponding entry of the GDT, which is called the "TSS Descriptor". See section 7.2.3 of Volume 3A for details.)


DISCLAIMER: there is however a major caveat today not relevant at the time you asked this question; the new Kernel Page Table Isolation mitigations introduced in response to Meltdown will cause a different fatal problem in this kind of hooking. In particular, your new Interrupt Descriptor Table will not be accessible from a user-mode value for cr3, so any fault in user-land will actually cause a triple fault once you've loaded in the new IDT (first the original fault, then a page fault because the IDT address is not present in the user-mode page tables, and then a triple fault because the double fault entry of the IDT will not be accessible either). Short of manually changing all of the user-mode page tables this renders your IDT hooking approach impossible.

The only solution is to manually overwrite an area of memory that you know will be present in user-mode page tables; for example, the original IRQ handler referenced in the IDT will point to a small segment of code that is always present in user-mode page tables and whose role is to change cr3 to the kernel-mode variant. Linux does this by clearing bits 11 and 12 of cr3, so you could overwrite this area of code with a small assembly stub that clears those bits and then jumps to your hook. As a proof of concept see here.

Atticus Stonestrom
  • 332
  • 1
  • 3
  • 12
0

As far as I know, the printk() requires much resource and complexity(console/file system/storage) than ftrace. If the crash only happens in case you have used printk(), why don't you use ftrace instead of printk()?

Many of Linux Kernel experts love ftrace.