0

I am working on enabling Intel SGX on a unikernel that does not have a native ring 3 support. Hence in order to invoke the user-mode SGX instruction I need to implement a ring switch routine. I followed the JamesM's tutorial( 10.-User Mode (jamesmolloy.co.uk) , which is a 32-bit solution) to drafted a long-mode version:

void switch_to_ring3()
{
    asm volatile("  \
      mov $0x23, %rax; \
      mov %rax, %ds; \
      mov %rax, %es; \
      mov %rsp, %rax; \
      push $0x23; \
      push %rax; \
      pushf; \
      push $0x1B; \
      push $1f; \
      iretq; \
    1: \
      "); 
        return;
}

I am sure that I have set up GDT entries properly and 0x23/0x1B is exactly the indexes of user-mode code/data descriptors, in which the code descriptor value is 0xaffb000000ffff and the data descriptor value is 0xaff3000000ffff.

What's strange is that the iretq can be executed successfully, and the rip register could go to the next instruction of the iretq, which is a nop if I disabled the optimization and a ret if I enabled the optimization. However, when executing the next instruction, it will die without any output (my unikernel has an exception handler, even if for unhandled exceptions, it will output something). I try to use GDB to debug and GDB said that the program received SIGQUIT.

I checked the registers but find nothing wrong, cs is 0x1b, ss, ds and es are 0x23, and rip points correctly to the next instruction of iretq.

I am really confused about why it receives SIGQUIT. If some exception happened, it should output the dump message, or at least qemu log will track some 'check_exception' message, but the log is empty. Everything seems okay, correct segment registers, correct rsp/rbp/rip, the kernel code segment is user-accessible by setting the conformed bit of its descriptor, and the high/low base address in all descriptors are pointed to 0x0.

Being trapped in this problem for a whole day but cannot find any solution. I hope someone here could save my life T_T

Xiangyi Meng
  • 103
  • 2
  • 7
  • 1
    Data descriptor must be cff3, not aff3. – prl Jun 08 '22 at 14:27
  • 1
    Is the U/S bit set for the code and stack pages? It's really unusual to not change RIP and RSP to different (user mode) pages when switching to user mode. – prl Jun 08 '22 at 14:35
  • Do you have a TSS containing a stack pointer for the switch back to ring 0 when an exception occurs in ring 3? – prl Jun 08 '22 at 14:39
  • @prl yep you are right, for data segment the long-mode bit must be clear, thx for reminding me that. For the exception occurs in ring 3, I need to do more investigation since I’m not familiar with that. But it seems that I set the rsp field to my boot loader stack, not kernel stack, so I need to check it later. The reason I keep rip/rsp static is because I just want to change the CPL to 3 to execute some ring 3 only SGX instruction, and there isn’t any other isolation requirements so for the simplicity I chose not to change the base_addr, rip/rsp… – Xiangyi Meng Jun 08 '22 at 18:32
  • Ok, but that means your pages need to have the U/S bit set, so code-fetch can still read the page in ring 3. Otherwise the PTE is only valid for ring 0. Have you tried single-stepping with a debugger? This is one of the huge advantages of QEMU and Bochs, vs. running on bare metal and just seeing the final result. – Peter Cordes Jun 08 '22 at 19:11
  • @PeterCordes I’ve tried to single-step using gdb, iterq is executed well, but when I try to execute the next instruction, it silently died, with a signal SIGQUIT. Maybe it's the problem of u/s bit, I'd like to check it and reply here later, thx! – Xiangyi Meng Jun 09 '22 at 03:50
  • Use a simulator that lets you see what exceptions happen. The guest CPU can't just kill a process with SIGQUIT; it has to take an exception which gets handled by the guest kernel. If your debug setup isn't tracing into that, get a better debug setup. – Peter Cordes Jun 09 '22 at 04:00
  • @PeterCordes actually I use QEMU with option -d int, trying to log the CPU exceptions. but it's weird that nothing output. So I am curious that if the kernel handles the exception, will there be some message like "check_exception" outputted by QEMU? – Xiangyi Meng Jun 09 '22 at 05:01
  • IDK, I haven't actually played around with OS debugging at all recently myself. You could try setting breakpoints on the kernel's exception entry points. Maybe edit your question to add that part, or test it separately for a known case like `div edx` to raise `#DE`, or a deref of a known-bad pointer to raise #PF or #GPF (non-canonical), in a case where you're *not* messing around with privilege levels. – Peter Cordes Jun 09 '22 at 05:09
  • @PeterCordes Now the original problem is fixed by setting the U/S bit of kernel code/data PTE. Now I faced a new problem, that is, when an exception occurs in ring-3, the CPU/kernel cannot correctly get back to ring-0 (the CS register will be set to 0xb instead of 0x8). I thought I would have to check if the implementation of IDT/TSS and exception handler is correct. Thanks for your reply! – Xiangyi Meng Jun 09 '22 at 08:00
  • IDK, sorry, I mostly know performance details, not osdev details like that. But I think the relevant IDT entry should set CS (including privilege level) that the exception handler runs with. I don't know if the TSS can mess this up or not. – Peter Cordes Jun 09 '22 at 08:15
  • @PeterCordes It's okay. Your help is fully appreciated. I think I need to refer to Intel manuals and OSDev Wiki, also. Thank you very much! – Xiangyi Meng Jun 09 '22 at 08:33

1 Answers1

1

I finally fixed it by setting U/S bit for all kernel code/data pages. Thanks for all of your comments @prl @PeterCordes !

Xiangyi Meng
  • 103
  • 2
  • 7
  • So, setting U/S fixed the page fault. But you still might want to fix page fault handling, so you don't have the same problem again the next time there is a mistake in your code. – prl Jun 11 '22 at 14:11
  • @prl Yes, it is fixed. I mistakenly set the conforming bit of the kernel code page. This bit will let the CPL stay at 3 when an exception occurs and the control is transferred back to the kernel. – Xiangyi Meng Jun 13 '22 at 12:55