Use -fno-function-cse
to not do common-subexpression-elimination on function addresses. GCC manual:
-fno-function-cse
Do not put function addresses in registers; make each instruction that calls a constant function contain the function’s
address explicitly.
This option results in less efficient code, but some strange hacks
that alter the assembler output may be confused by the optimizations
performed when this option is not used.
The default is -ffunction-cse
How to find specific GCC options
I looked at gcc -O1 -fverbose-asm
asm output to see all the optimization options that -O1
implies (which GCC lists in asm comments). -O1 -fno-...
versions of everything compiled to just 3 call
instructions with the symbol name on each, confirming that one of them was the one I wanted, so I just had to narrow it down by bisecting that list of -fno-
options
I used the Godbolt compiler explorer with which has MSP430 GCC6.2.1, test code + asm. I disabled the "comments" filter option so I could see pure-comment lines in the asm output.
Since there were a ton of options, I used tr ' ' '\n' | sed -e 's/-f/-fno-/' -e '/;/d'
to turn -f
options into their negative form. I copy/pasted the whole block of asm comments into that command in a terminal, and copy/pasted the result into the GCC options box on Godbolt. (Along with -O1
. -O0
is a special anti-optimized mode for consistent debugging, so an across-statement optimization might never be active at -O0
even with the right option. That's why I needed to negate the options instead of trying the positive form without -O1
)
Then I selected and removed a bunch of options to see if that changed the asm. If not, keep going. When I found a block that did, I knew the option I wanted was in there, so I could undo (control-z) and remove all other -f
options, then narrow it down to one. (As soon as I saw the name -fno-function-cse
in that group, I figured that sounded like the right sort of thing. GCC options do fortunately have meaningful names if you know compiler / optimization terminology.)
That was faster than looking at 1 option at a time, or wading through the manual, because I wasn't even sure that any of those specific options would control this.
BTW, GCC doesn't do that code-size optimization for most other ISAs because it's not a performance win for them. Code-size isn't the most important factor for performance on x86-64 or even ARM thumb; the extra cost of possible branch misprediction for indirect jumps (and extra pollution of the branch predictors) outweighs the code-size cost.
It is a code-size win on x86, where a 5-byte mov
-immediate or 7-byte RIP-relative lea
(x86-64) can set up for multiple 2-byte call
instructions.
It's usually not even a code-size win on many fixed-instruction-width ISAs like AArch64 or ARM (except in Thumb mode), where the standard code model assumes that functions will be in range of each other for relative branch-and-link (call) instructions. So calling any function takes one instruction, of the same size as any other instruction.
Even with -ffunction-cse
enabled explicitly, GCC simply does not do this optimization for x86-64 or ARM thumb, even in a case where it already use a function pointer from the GOT. (x86-64 gcc -Os -fPIE -fno-plt -ffunction-cse
on Godbolt. I even told GCC to optimize for code-size; saving/restoring a call-preserved register like RBX for use with a 2-byte call rbx
instead of 6-byte call [RIP+rel32]
would save size even after the extra instructions required to push/pop RBX (1 byte each) and load into RBX (one mov with a RIP-relative addressing mode).)
This could be considered a missed optimization for -Os
, especially for ARM Thumb for "simple" cores like -mcpu=cortex-m3
which might not even have a branch predictor.
(AArch64 will load a function-pointer into a register with -fPIE -fno-plt
, for function without "hidden" visibility, i.e. where the function might only be in a shared library. This happens even with -fno-function-cse
. https://godbolt.org/z/f3MP56.)