Why syscall/sysret in legacy mode is considered "sufficiently poorly designed"?

Question

See comments in https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_64_compat.S

I understand that because 32-bit syscall/sysret doesn't save/restore ESP, it's necessary to handle NMI in a task gate to ensure a good stack pointer. Other than that, what are the other obstacles for OS to adopt it? Are there operating systems supporting it or all operating systems use sysenter/sysexit for fast system calls in 32-bit legacy mode?

score 5 · Accepted Answer · answered May 10 '20 at 08:47

Note: I've never dealt with legacy syscall being an AMD only instruction.

The main problem with legacy syscall is that it requires some form of per-cpu space where to save the current registers.
As you know, the OS cannot save the registers on the stack (since ESP is not changed by the instruction), nor it can set up a different stack before saving the current one.

In a single CPU system (meaning Uniprocessor system, i.e. no SMP with or without hyperthreading), the OS can save the current registers in a known, fixed, location in memory.
Instructions like mov DWORD [0badf00dh], esp have the address encoded as an immediate, so no architectural registers need to be set up upfront.
However, this won't work on SMP systems, where the same code is shared among all the CPUs, unless the OS use the same region of memory for all of them (serializing the access to it).
Note that you can't load a per-cpu pointer as this would necessarily overwrite some register.

Another important point is that legacy syscall doesn't save eflags, this makes writing its handler like walking on eggshells.
Furthermore this instruction also arbitrarily set VM and IF to zero, making it harder to write reentrant code.

One way around it is with the calling convention: the OS could label a register (or a few) as volatile across the call (like ecx already is).
The problem is that you may end up saving more registers than you thought, making the performance gain thin.
Another implausible workaround could be to assemble the entry-point of syscall for each CPU at runtime (basically, just patching the moffsets fields), but this is extremely hacky.

In 64-bit mode, the OS can rely on swapgs to have a per-cpu pointer (or more properly, a per-cpu base address) where to store the current registers.
As swapgs loads from an MSR, this can be set up in advance during the OS initialization.

Note that on 64-bit systems, the OS can also use the upper GPRs, as Linux does, to save esp into, for example, r8d.
This works when handling 32-bit compatibility mode programs.

To make a long story short: legacy syscall makes it really hard for the OS to save the current context in a per-cpu region of memory.

Thank you Margaret! Very comprehensive answer. Just one more question: does the `eflags` issue apply to `sysenter` too? — Zuxy, May 11 '20 at 01:05
@Zuxy If the stack is sound, you can simply do a `pushf(d)` to solve the issue :) — Margaret Bloom, May 11 '20 at 09:03

Why syscall/sysret in legacy mode is considered "sufficiently poorly designed"?

1 Answers1