0

I found the sample program below somewhere on the Web. Various copies of it abound, usually with small differences. But my question concerns the size of the shadow area at the top of the stack when calling a function from the Windows API. This program works perfectly as shown, with decimal 40 subtracted from the stack pointer, to allow room for the 4 parameters that are passed in registers, plus one more. However, in this case there is no 5th parameter, and yet if the sub rsp, 40 is changed to sub rsp, 32, and no other changes are made, the 'Hello world' window is no longer displayed! Is there some reason why when only 4 parameters are involved, all of which are passed in registers, it's still necessary to reserve 40 (5*8) bytes at the top of the stack rather than only 32 (4*8)?

; Sample x64 Assembly Program
; Chris Lomont 2009 www.lomont.org
; command to assemble is: 
; ml64 hello.asm /link /subsystem:windows /defaultlib:kernel32.lib /defaultlib:user32.lib /entry:Start
extrn ExitProcess: PROC   ; in kernel32.lib
extrn MessageBoxA: PROC   ; in user32.lib
.data
caption db '64-bit hello!', 0
message db 'Hello World!', 0
.code
Start PROC
  sub    rsp, 40      ; shadow space, aligns stack
  mov    rcx, 0       ; hWnd = HWND_DESKTOP
  lea    rdx, message ; LPCSTR lpText
  lea    r8,  caption ; LPCSTR lpCaption
  mov    r9d, 0       ; uType = MB_OK
  call   MessageBoxA  ; call MessageBox API function
  mov    ecx, eax     ; uExitCode = MessageBox(...)
  call ExitProcess
Start ENDP
End
Jester
  • 56,577
  • 4
  • 81
  • 125
  • 4
    Yes, stack needs to be 16 byte aligned. – Jester Apr 23 '20 at 22:24
  • Okay, but wouldn't 32 be 16 byte aligned, since 16*2 = 32? I don't see how 40 is 16 byte aligned when it's 8 * an odd number. Sorry, not trying to be dense or argumentative. – Robert Watson Apr 23 '20 at 23:23
  • 2
    The `call` from the parent function already pushed 8 bytes onto the stack. So when you enter your function, the stack is already *misaligned* and needs to be fixed. – Nate Eldredge Apr 23 '20 at 23:26
  • Or, more importantly, the `call` you do will push 8 bytes. – Jester Apr 23 '20 at 23:27
  • @Jester: Maybe I have it backwards - is the stack supposed to be aligned to 16 bytes as of before the call, or after? – Nate Eldredge Apr 23 '20 at 23:29
  • 1
    Yeah you are correct, the stack is misaligned by 8 upon entry, but as long as you make sure you adjust by multiple of 16 bytes it will maintain whatever the (mis)alignment was. Since you only have control of your own stack adjustments and calls, that's what you should focus on. Obviously if you need aligned locals you have to take the actual misalignment into account. – Jester Apr 23 '20 at 23:36
  • I just tried this and it worked: `sub rsp, 32` then `mov rcx, 0fffffffffffffff0h'` then `and rsp, rcx` – Robert Watson Apr 24 '20 at 00:09
  • Before the 'and' instruction rsp was 37A7DDF948. After the 'and' instruction rsp was 37A7DDF940. – Robert Watson Apr 24 '20 at 00:16
  • Ok, but then how are you going to restore RSP later so you can `ret`? Most functions have to return instead of calling ExitProcess. It's better to just `sub rsp, 40` so you know you have 8 bytes of space above the callee's shadow space, instead of moving the stack pointer either 8 or 0 bytes depending on incoming alignment. (The incoming alignment is fixed, so not taking advantage of it is silly). – Peter Cordes Apr 24 '20 at 00:21
  • And BTW, `0fffffffffffffff0h` can be represented by an 8-bit sign-extended immediate, so you could write `and rsp, 0fffffffffffffff0h` if you want. Or write it as `and rsp, -16` if you're comfortable enough with 2's complement to use negative numbers as round-down masks. Same machine code either way, just another way of representing the same value `0fffffffffffffff0h` in the source. – Peter Cordes Apr 24 '20 at 00:23
  • I hope this isn't a stupid question, but unless I either test the low 4 bits of RSP, or divide the address in RSP by 16 and then look for a remainder, how would I know whether the address in RSP is 16 byte aligned or not? BTW yes, I'm comfortable with 2's compliment, so I think using -16 is a good idea. – Robert Watson Apr 24 '20 at 01:40
  • You trust that your caller also used the proper (mis)alignment according the the convention so you don't need to worry about anything else apart from only adjusting the stack pointer by multiple of 16 in total. – Jester Apr 24 '20 at 10:38
  • I suppose the question is how did the author of the sample program know in advance that the stack would be misaligned by 8, so that he would need to subtract 40 from it instead of 32? I would prefer to do the 'and' so that it will always work, and doing the 'and' has exactly the same effect as subtracting the 40 instead of the 32 in the cases where it's needed. – Robert Watson Apr 24 '20 at 18:23
  • The kernel invokes your program with the stack aligned, or if it doesn’t, then the startup code for your program in the runtime library aligns the stack. In either case, it would be absurd for every function to use `and` to align the stack instead of just adjusting it by an odd multiple of 8 which (in conjunction with the return address pushed by the call) maintains the required alignment. – prl Apr 24 '20 at 19:16
  • @Jester, your comments are a little hard to understand. Each function needs to adjust the stack by an odd multiple of 8, not by a multiple of 16, to maintain alignment. – prl Apr 24 '20 at 19:18
  • The problem with using `and` in every function is that `and` is lossy, so the code then also has to waste another register to hold the prior value of the stack pointer. – prl Apr 24 '20 at 19:20
  • Again, the result in this case of subtracting 32 and then doing the `and` instruction is exactly the same as the result of subtracting 40 from the stack pointer in the first place. We're talking about a standalone program in which absolutely nothing was done with the stack pointer prior to the subtract, so there is nothing the program can do to prevent the misalignment. The program just has to cope with it. – Robert Watson Apr 24 '20 at 21:37
  • You are asking about windows. There is no such thing as standalone program. The OS launches your program with well defined initial state. But yeah, if you e.g. need larger alignment for whatever reason you can do something like that. The point about reversibility still stands: it's easy to add a constant back but you can't reverse `and` so you need to remember the original value if you want to restore it. – Jester Apr 24 '20 at 22:11
  • Yes, even back in prehistoric days your user program was called by an initiator somewhere, the registers for which were saved at the beginning of the program and restored at the end. I only meant that the program isn't called by any other user code. In the current context, many people seem to use the word function in a way with which I am not familiar, so that it includes programs that are called only by the OS. – Robert Watson Apr 26 '20 at 17:52

0 Answers0