Note: I've never dealt with legacy syscall
being an AMD only instruction.
The main problem with legacy syscall
is that it requires some form of per-cpu space where to save the current registers.
As you know, the OS cannot save the registers on the stack (since ESP
is not changed by the instruction), nor it can set up a different stack before saving the current one.
In a single CPU system (meaning Uniprocessor system, i.e. no SMP with or without hyperthreading), the OS can save the current registers in a known, fixed, location in memory.
Instructions like mov DWORD [0badf00dh], esp
have the address encoded as an immediate, so no architectural registers need to be set up upfront.
However, this won't work on SMP systems, where the same code is shared among all the CPUs, unless the OS use the same region of memory for all of them (serializing the access to it).
Note that you can't load a per-cpu pointer as this would necessarily overwrite some register.
Another important point is that legacy syscall
doesn't save eflags
, this makes writing its handler like walking on eggshells.
Furthermore this instruction also arbitrarily set VM
and IF
to zero, making it harder to write reentrant code.
One way around it is with the calling convention: the OS could label a register (or a few) as volatile across the call (like ecx
already is).
The problem is that you may end up saving more registers than you thought, making the performance gain thin.
Another implausible workaround could be to assemble the entry-point of syscall
for each CPU at runtime (basically, just patching the moffset
s fields), but this is extremely hacky.
In 64-bit mode, the OS can rely on swapgs
to have a per-cpu pointer (or more properly, a per-cpu base address) where to store the current registers.
As swapgs
loads from an MSR, this can be set up in advance during the OS initialization.
Note that on 64-bit systems, the OS can also use the upper GPRs, as Linux does, to save esp
into, for example, r8d
.
This works when handling 32-bit compatibility mode programs.
To make a long story short: legacy syscall
makes it really hard for the OS to save the current context in a per-cpu region of memory.