I'm analysing how the compiler implements the variable-length array in c99. The following is my c code and disassembly which is commented on my understanding. The code is compiled with "-O3 -fomit-frame-pointer -fno-stack-protector -fpie"
c code:
# include<stdio.h>
int main() {
size_t sz; // never be signed
scanf("%zd", &sz);
volatile char s[sz+1]; // prevent to be optimized away.
s[sz] = '\0';
}
disassembly:
Reading symbols from a.out...
(gdb) disass main
Dump of assembler code for function main():
0x0000000000001060 <+0>: endbr64
0x0000000000001064 <+4>: push %rbp # save the current frame pointer.
0x0000000000001065 <+5>: lea 0xf98(%rip),%rdi # rdi = "%zd". 1st param
0x000000000000106c <+12>: xor %eax,%eax # eax = 0.
0x000000000000106e <+14>: mov %rsp,%rbp # set the new frame pointer.
0x0000000000001071 <+17>: sub $0x10,%rsp # allocate a 16 bytes. rsp is aligned by 16.
0x0000000000001075 <+21>: lea -0x8(%rbp),%rsi # rsi = &sz. 2nd param.
0x0000000000001079 <+25>: callq 0x1050 <__isoc99_scanf@plt> # call __isoc99_scanf
# volatile char s[sz+1]; // prevent to be optimized away.
0x000000000000107e <+30>: mov -0x8(%rbp),%rcx # rcx = sz
0x0000000000001082 <+34>: mov %rsp,%rdi # rdi = rsp.
0x0000000000001085 <+37>: lea 0x10(%rcx),%rax # rax = sz + 1 + 15
0x0000000000001089 <+41>: mov %rax,%rdx # rdx = sz + 1 + 15
0x000000000000108c <+44>: and $0xfffffffffffff000,%rax # be mutilple of 4096
0x0000000000001092 <+50>: sub %rax,%rdi # rdi is the address of the array s
0x0000000000001095 <+53>: and $0xfffffffffffffff0,%rdx # be multiple of 16
0x0000000000001099 <+57>: mov %rdi,%rax # rax = &s
0x000000000000109c <+60>: cmp %rax,%rsp # if sz+16 is less than 4096,
0x000000000000109f <+63>: je 0x10b6 <main()+86> # then jump to main+86 for
# the stack is grown as page size for every iteration of the loop.
0x00000000000010a1 <+65>: sub $0x1000,%rsp # grow the stack.
0x00000000000010a8 <+72>: orq $0x0,0xff8(%rsp) # probe stack(???).
0x00000000000010b1 <+81>: cmp %rax,%rsp # if rsp isn't equal to rax,
0x00000000000010b4 <+84>: jne 0x10a1 <main()+65> # then loop.
0x00000000000010b6 <+86>: and $0xfff,%edx # be less than 4096
0x00000000000010bc <+92>: sub %rdx,%rsp # allocate the remainder.
0x00000000000010bf <+95>: test %rdx,%rdx # if the remainder is not zero,
0x00000000000010c2 <+98>: jne 0x10cc <main()+108> # then, jump to probe stack(?).
0x00000000000010c4 <+100>: movb $0x0,(%rsp,%rcx,1) # s[sz] = '\0'
0x00000000000010c8 <+104>: xor %eax,%eax # eax = 0.
0x00000000000010ca <+106>: leaveq # restore the previous stack frame.
0x00000000000010cb <+107>: retq # return 0;
0x00000000000010cc <+108>: orq $0x0,-0x8(%rsp,%rdx,1) # probe stack(??).
0x00000000000010d2 <+114>: jmp 0x10c4 <main()+100> # jump back.
End of assembler dump.
"https://nullprogram.com/blog/2019/10/27/"
says that first, -fomit-frame-pointer
is ignored because VLA have to track the stack-frame dynamically. Second, when -fstack-clash-protection
is enabled the compiler generates extra code to probe every pages of allocation in case one of those pages is a guard page, etc..
But in my disassembly code, I don't understand these lines:
# the stack is grown as page size for every iteration of the loop.
0x00000000000010a1 <+65>: sub $0x1000,%rsp # grow the stack.
0x00000000000010a8 <+72>: orq $0x0,0xff8(%rsp) # probe stack(???).
0x00000000000010b1 <+81>: cmp %rax,%rsp # if rsp isn't equal to rax,
0x00000000000010b4 <+84>: jne 0x10a1 <main()+65> # then loop.
What does "orq $0x0, 0xff8(%rsp)" mean??. and what is probing stack?