This is to support pthread cancellation points; a signal handler can later look at the stack.
The commit log for the commit that introduced this code explains that storing a pointer at a known place on the stack before a syscall makes it possible for the "cancellation signal handler" to determine "whether the interrupted code was in a cancellable state." (The initial version of that code also saves the address of the syscall
instruction, but later commits changed that.)
The first arg (which that asm function stores on the stack) comes from its C caller, __syscall_cp_c
, which passes __syscall_cp_asm(&self->cancel, nr, u, v, w, x, y, z);
, where self
came from __pthread_self()
.
You're correct, overwriting the caller's stack arg with a different incoming arg is not "visible" to a C caller following the x86-64 System V ABI. (A callee owns its stack args; the caller has to assume they've been overwritten so compiler generated code will never read that memory location as an output). So we needed to look for alternate explanations.
Using 2 total mov instructions to copy the incoming RDI into the 8(%rsp)
after reading that memory location is I think necessary. We can't delay the mov %rdx,%rdi
until after the load because we need to free up RDX to hold R8, to free up R8 to hold the load. You could avoid touching an "extra" register by using R10 before it's used to load the other arg, but it would still take at least 2 instructions.
Or the arg order could be optimized to pass that pointer in a later arg, perhaps passing the call number last and the pthread pointer in the last register arg (minimal shuffling but avoiding need for a double dereference for that test/branch) or the first stack arg (where you want it anyway). Or match the arg order of the __syscall
wrapper that takes nr
first with no pthread pointer.