2

According to some textbooks, the compiler will use sub* to allocate memory for local variables.

For example, I write a Hello World program:

int main()
{
    puts("hello world");
    return 0;
}

I guess this will be compiled to some assembly code on the 64 bit OS:

    subq    $8, %rsp
    movq    $.LC0, (%rsp)
    calq    puts
    addq    $8, %rsp

The subq allocates 8 byte memory (size of a point) for the argument and the addq deallocates it.

But when I input gcc -S hello.c (I use the llvm-gcc on Mac OS X 10.8), I get some assembly code.

.section    __TEXT,__text,regular,pure_instructions
.globl  _main
.align  4, 0x90
_main:
Leh_func_begin1:
       pushq    %rbp
Ltmp0:
    movq    %rsp, %rbp
Ltmp1:
    subq    $16, %rsp
Ltmp2:
    xorb    %al, %al
    leaq    L_.str(%rip), %rcx
    movq    %rcx, %rdi
    callq   _puts
    movl    $0, -8(%rbp)
    movl    -8(%rbp), %eax
    movl    %eax, -4(%rbp)
    movl    -4(%rbp), %eax
    addq    $16, %rsp
    popq    %rbp
    ret

   .......

L_.str:
      .asciz     "hello world!"

Around this callq without any addq and subq. Why? And what is the function of addq $16, %rsp?

Thanks for any input.

Ghjhdf
  • 161
  • 8
  • 1
    There's _is_ a matching `subq`/`addq` pair in the assembly output ("function prologue" and epilogue). I don't really see what your question is about. – Mat Oct 14 '12 at 14:37
  • your example allocates a function call parameter, not a local var (its the alternate method to `PUSH`). also, the compiler has aggregated the parameter and local variable allocations. – Necrolis Oct 14 '12 at 14:44
  • I have read a course:[Writing a compiler in Ruby bottom up](http://www.hokstad.com/writing-a-compiler-in-ruby-bottom-up-step-2.html). But his compiler output is different with mine. – Ghjhdf Oct 14 '12 at 14:55

1 Answers1

5

You don't have any local variables in your main(). All you may have in it is a pseudo-variable for the parameter passed to puts(), the address of the "hello world" string.

According to your last disassembly, the calling conventions appear to be such that the first parameter to puts() is passed in the rdi register and not on the stack, which is why there isn't any stack space allocated for this parameter.

However, since you're compiling your program with optimization disabled, you may encounter some unnecessary stack space allocations and reads and writes to and from that space.

This code illustrates it:

subq    $16, %rsp ; allocate some space
...
movl    $0, -8(%rbp) ; write to it
movl    -8(%rbp), %eax ; read back from it
movl    %eax, -4(%rbp) ; write to it
movl    -4(%rbp), %eax ; read back from it
addq    $16, %rsp

Those four mov instructions are equivalent to just one simple movl $0, %eax, no memory is needed to do that.

If you add an optimization switch like -O2 in your compile command, you'll see more meaningful code in the disassembly.

Also note that some space allocations may be needed solely for the purpose of keeping the stack pointer aligned, which improves performance or avoids issues with misaligned memory accesses (you could get the #AC exception on misaligned accesses if it's enabled).

The above code shows it too. See, those four mov instructions only use 8 bytes of memory, while the add and sub instructions grow and shrink the stack by 16.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • Thanks for your useful answer. And I get some information about this problem. Actually, the 64 bit archive use rdi, rsi, rdx, rcx, r8, r9 to pass parameter rather than pushing to stack. – Ghjhdf Oct 15 '12 at 06:56
  • 16-byte stack alignment is required for correctness because functions are allowed to use SSE loads/stores on their own stack memory: [glibc scanf Segmentation faults when called from a function that doesn't align RSP](//stackoverflow.com/q/51070716) / [Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?](//stackoverflow.com/q/49391001). (I'm sure you came across this sometime in the 8 years since writing this answer, but commenting for future readers) – Peter Cordes Oct 09 '20 at 08:40
  • Enabling the AC flag is not really viable; gcc optimizations *will* intentionally do misaligned loads or stores sometimes, e.g. when zeroing an adjacent pair of elements in an array, like `arr[1]` and `arr[2]`. And glibc's hand-written `memcpy` freely uses unaligned loads/stores for short copies. – Peter Cordes Oct 09 '20 at 08:41