7

I have loaded an idt table with 256 entries, all pointing to similar handlers:

  • for exceptions 8 and 10-14, push the exception number (these exceptions push an error code automatically)
  • for the others, push a "dummy" error code and the exception number;
  • then jump to a common handler

So when the common handler enters, the stack is properly aligned and contains the exception/interrupt number, error code (which may just be a dummy), eflags, cs and eip.

My question regards returning from the interrupt handler. I use iret to return after taking out the exception number and the error code from the stack, but this doesn't work for exception nr 8; if I leave the error code on the stack, then it returns fine!

Questions:

  • do I have to leave the error code on the stack for exceptions that put the error code there? If so, how does iret determine whether it has to pop an error code or not?
  • as soon as I enable interrupts I always get exception 8 (double fault), but then everything runs fine (I'm developing a hobby OS). Is this normal behavior or do I have a bug somewhere?
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Joao da Silva
  • 7,353
  • 2
  • 28
  • 24
  • Also, pointers to the intel manuals would be most welcome :) I haven't found anything regarding these problems there yet. – Joao da Silva Jan 29 '09 at 13:39

6 Answers6

14

If the CPU pushed an error code automatically, the handler must pop it before the iret. The iret instruction doesn't know where you're coming from, if it's a fault, a trap or an external interrupt. It always does the same, and it assumes that there's no error code on the stack.

Quoting from the SDM (Software Developer's Manual), Volume 3, Chapter 5, section 5.13 titled Error Code:

The error code is pushed on the stack as a doubleword or word (depending on the default interrupt, trap, or task gate size). To keep the stack aligned for doubleword pushes, the upper half of the error code is reserved. Note that the error code is not popped when the IRET instruction is executed to return from an exception handler, so the handler must remove the error code before executing a return.

You can find the IA-32 Software Developer's Manual here: http://www.intel.com/products/processor/manuals/

Volume 3 part 1, chapter 5, describes exception and interrupt handling. Volume 2 part 1 has the spec for the iret instruction.

Nathan Fellman
  • 122,701
  • 101
  • 260
  • 319
2

I had a similar problem with "double faults" as soon as I enabled interrupts. Well, they looked like double faults, but they really were timer interrupts!

Double faults are interrupt number 8.

Unfortunately, a default PIC configuration signals timer interrupts as interrupt number (DEFAULT_PIC_BASE + TIMER_OFFSET) = (8 + 0) = 8.

Masking out all my PIC interrupts (until I was ready to properly configure the PIC) silenced these double-fault-lookalike timer interrupts.

(PICs require the CPU to acknowledge interrupts before they produce the next one. Since your code wasn't acknowledging the initial timer interrupt, the PIC never gave you any more! That's why you only got one, rather than the zillion one might have expected.)

stalepretzel
  • 15,543
  • 22
  • 76
  • 91
1

Do I have to leave the error code on the stack for exceptions that put the error code there?

As others mentioned, you have to do either:

pop %eax
/* Do something with %eax */
iret

Or if you want to ignore the error code:

add $4, %esp
iret

If you don't, iret will interpret the error code as the new CS, and you are likely to get a general protection fault as mentioned at: Why does iret from a page fault handler generate interrupt 13 (general protection fault) and error code 0x18?

Minimal Working this page handler that I've created to illustrate this. Try commenting out the pop and see it blow up.

Compare the above with a Division error exception which does not to pop the stack.

Note that if you do simply int $14, no extra byte gets pushed: this only happens on the actual exception.

Intel Manual Volume 3 System Programming Guide - 325384-056US September 2015 Table 6-1. "Protected-Mode Exceptions and Interrupts" column "Error Code" contains the list of interrupts that push the error code or not.

38.9.2.2 "Page Fault Error Codes" explains what the error means.

A neat way to deal with this is to push a dummy error code 0 on the stack for the interrupts that don't do this to make things uniform. James Molloy's tutorial does exactly that.

The Linux kernel 4.2 seems to do something similar. Under arch/x86/entry/entry64.S it models interrupts with has_error_code:

trace_idtentry page_fault do_page_fault has_error_code=1

and then uses it on the same file as:

.ifeq \has_error_code
pushq $-1 /* ORIG_RAX: no syscall to restart */
.endif

which does the push when has_error_code=0.

Community
  • 1
  • 1
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • 1
    Most OSes would set up their IDT so `int $14` or `int $0` from user-space isn't allowed. i.e. it doesn't enter the same exception handler as an actual #PF or #DE, instead it will cause a #GP. This allows exception-handlers to assume an error code is present or not as appropriate, and doesn't let user-space fake things. – Peter Cordes May 02 '23 at 19:46
  • @PeterCordes : Yes, specifically the IDT in such OSes would have entries for those interrupts with a Descriptor Privilege Level (DPL) set to 0 (Ring 0) rather than set to 3 (Ring3). If the Current Privilege Level (CPL) is less than the DPL of the interrupt it is allowed otherwise a #GP. This particular check doesn't apply to external interrupts. – Michael Petch May 02 '23 at 21:32
1

I wrote a small x86 OS a while back. Take a look at the file isr.asm in the cvs repository.

Notice how we set up the handlers, most push a dummy dword onto the stack to account for the few handlers that automatically get an error code pushed. Then when we return via an iret we can always assume 2 dwords on the stack irrespective of the interrupt and perform an add esp, 8 before the iret to clean things up nicely.

That should answer your first question.

As for your second question: A double fault when you enable interrupts, ...hmmm could be a problem with paging if you haven't set it up correctly. Could be a million other thing too :)

QAZ
  • 4,870
  • 6
  • 36
  • 50
  • Hi Steve, thanks for the reply :) I don't have paging enabled and I only have 3 valid entries in the GDT (for code and data in ring 0, and another to use in real-mode). I'm usually in protected-mode but I go back to real-mode to use some BIOS routines (mostly reading from the disk). Any clues? :) – Joao da Silva Jan 29 '09 at 14:08
  • Sorry can't think of anything off hand, maybe switching back to real mode causes problems – QAZ Jan 29 '09 at 14:15
  • 2
    Link to your OS is dead, maybe you have a GitHub version? I'm making a collection of educational OSes. – Ciro Santilli OurBigBook.com Oct 28 '15 at 18:19
-1

In 64 bit mode the interrupt stack is aligned on a 16 byte boundary. So, regardless as to whether the vector was called by an int instruction, an exception, a hardware interrupt, or has no error code, a possible error code can be popped simply by setting the least significant bits of the stack pointer.

The following code gives a consistent, 16 byte aligned stack:

    push 1
    push 0
    and rsp, -8  ; Align Stack

L1: ...

    test [rsp+8], 1
    jz L2

    ... ; Error code present

    jmp L3

L2: ... ; Error code absent

L3: ...

    add rsp, 24
    retiq
Stack Before
ff8: ss ff8: ss
ff0: rsp ff0: rsp
fe8: rflags fe8: rflags
fe0: cs fe0: cs
fd8: rip <= rsp fd8: rip
fd0: Error Code <= rsp
Stack At L1
ff8: ss ff8: ss
ff0: rsp ff0: rsp
fe8: rflags fe8: rflags
fe0: cs fe0: cs
fd8: rip fd8: rip
fd0: 1 fd0: Error Code
fc8: 0 fc8: 1
fc0: xxxx <= rsp fc0: 0 <= rsp

For the case where no there is no privilege level change, the CPU will align the stack on a 16 byte boundary before pushing the ss. Where there is a privilege level change (or the IST is used) simply ensure that the entries in the TSS are 16 byte aligned. To return, simply pop off the garbage or error code and execute iretq.

For more information see sections 6.14.2 and 7.7 in the "Intel® 64 and IA-32 Architectures Software Developer’s Manual - Volume 3A" available here.

  • You might want to correct these: "16 **bit** aligned stack", "privil**a**ge level", and "execute retiq". (16-byte aligned stack, privilege level, execute `iretq`) – Sep Roland May 05 '23 at 17:16
  • The stack is aligned after the interrupt occurs but before the return values are pushed in all cases if the tss entries are aligned (aligned on 16 byte boundary) as explained in the second paragraph. – Charles Cross May 07 '23 at 23:16
-2

In 64 bit mode the interrupt stack is aligned on a 16 byte boundary. So, regardless as to whether the vector was called by an int instruction, an exception, a hardware interrupt, or has no error code, a possible error code can be popped simply by setting the least significant bit of the stack pointer.

  • 2
    The stack doesn't have to be 16 byte aligned when an interrupt occurs (or if a priv level change the SS0:ESP0 in the TSS structure doesn't have to be 16 byte aligned although OS developers often have that kind of alignment). In 64-bit mode the error is a 64-bit value pushed on the stack so setting the low bit of the stack pointer isn't going to do what you think it will do. – Michael Petch May 02 '23 at 19:33
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 03 '23 at 23:30