17

What's the rationale behind dropping the frame pointer on 64-bit architectures by default? I'm well aware that it can be enabled but why does GCC disable it in the first place while having it enabled for 32-bit? After all, 64-bit has more registers than 32-bit CPUs.

Edit:

Looks like the frame pointer will be also dropped for x86 when using a more recent GCC version. From the manual:

Starting with GCC version 4.6, the default setting (when not optimizing for size) for 32-bit Linux x86 and 32-bit Darwin x86 targets has been changed to -fomit-frame-pointer. The default can be reverted to -fno-omit-frame-pointer by configuring GCC with the --enable-frame-pointer configure option.

But why?

Community
  • 1
  • 1
asdf
  • 249
  • 3
  • 10

1 Answers1

13

For x86-64, the ABI (PDF) encourages the absence of a frame pointer. The rationale is more or less "we have DWARF now, so it's not necessary for debugging or exception unwinding; if we make it optional from day one, then no software will come to depend on its existence."

x86-64 does have more registers than x86-32, but it still doesn't have enough. Freeing up more general-purpose registers is always a Good Thing from a compiler's point of view. The operations that require a stack crawl are slower, yes, but they are rare events, so it's a good tradeoff for shaving a few cycles off every subroutine call plus fewer stack spills.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • 2
    How does the performance of using DWARF unwinding via CFI compare to traversing the frames? I suppose it adds a lot of overhead because the .debug_frame section is not mapped into the process, i.e. requires a few syscalls for opening the ELF binary, then parsing the file to find the section and finally we can parse the CFI. This sounds all very slow. – asdf May 23 '11 at 16:35
  • @asdf - what about code that does *not* do any unwinding? It has one more register free and does not have to set up the stack frame. – Bo Persson May 23 '11 at 16:54
  • 3
    Unwinding is done with the information in `.eh_frame` which *is* mapped into the process (it's a subset of `.debug_frame`). Yes, it's still slower than chasing frame pointers, but as I said, the assumption is that it's a rare event. – zwol May 23 '11 at 17:22
  • 3
    The [ABI](http://www.x86-64.org/documentation/abi.pdf) allows the use of `%rbp` to be avoided (see §3.2.2, footnote 7), but does not *require* that there not be a frame pointer (use of `%rbp` is shown in the stack frame layout in Figure 3.3, and it is described as "optionally used a frame pointer" in Figure 3.4). – Matthew Slattery May 23 '11 at 19:41
  • @MatthewSlattery: lol, as if we need permission from the ABI to keep whatever value we want in any call-preserved register like RBP or RBX that the ABI doesn't put any other requirements on. (Although the ABI does at least once overstep its bounds, declaring that a `bool` in a register must be stored as 0 or 1 (not 0 / non-zero) even inside a function, not just on call/ret boundaries. And that local arrays on the stack must have 16-byte alignment if they're VLA or size>=16 bytes. The same requirement on global arrays (static storage) makes sense, but other functions can only see pointers..) – Peter Cordes Apr 06 '19 at 10:44
  • 1
    Or is there support in DWARF CFI for referencing saved registers relative to RBP instead of RSP, so you can omit `.cfi` updates when changing RSP? I guess there must be, to support functions with VLAs or `alloca` where compilers choose to make a stack frame for the same reason. So not so laughable after all. – Peter Cordes Apr 06 '19 at 10:47