1

I have this program:

static int aux() {
    return 1;
}
int _start(){
    int a = aux();
    return a;
}

When I compile it using GCC with flags -nostdlib -m32 -fpie and generate an ELF binary, I get the following assembly code:

00001000 <aux>:
    1000:   f3 0f 1e fb             endbr32 
    1004:   55                      push   %ebp
    1005:   89 e5                   mov    %esp,%ebp
    1007:   e8 2d 00 00 00          call   1039 <__x86.get_pc_thunk.ax>
    100c:   05 e8 2f 00 00          add    $0x2fe8,%eax
    1011:   b8 01 00 00 00          mov    $0x1,%eax
    1016:   5d                      pop    %ebp
    1017:   c3                      ret    

00001018 <_start>:
    1018:   f3 0f 1e fb             endbr32 
    101c:   55                      push   %ebp
    101d:   89 e5                   mov    %esp,%ebp
    101f:   83 ec 10                sub    $0x10,%esp
    1022:   e8 12 00 00 00          call   1039 <__x86.get_pc_thunk.ax>
    1027:   05 cd 2f 00 00          add    $0x2fcd,%eax
    102c:   e8 cf ff ff ff          call   1000 <aux>
    1031:   89 45 fc                mov    %eax,-0x4(%ebp)
    1034:   8b 45 fc                mov    -0x4(%ebp),%eax
    1037:   c9                      leave  
    1038:   c3                      ret    

00001039 <__x86.get_pc_thunk.ax>:
    1039:   8b 04 24                mov    (%esp),%eax
    103c:   c3                      ret

I know that the get_pc_thunk function is used to implement position-independent code in x86, but in this case I can't understand why it is being used. My questions are:

  1. The function is returning the address of the next instruction in the eax register and, in both usages, an add instruction is being used to make eax point to the GOT. Normally, (at least when accessing global variables), this eax register would be immediately used to access a global variable in the table. In this case, however, the eax is being completely ignored. What is going on?
  2. I also don't understand why the get_pc_thunk is even present in the code, since both call instructions are using relative addresses. Since the addresses are relative, shouldn't they already be position-independent out of the box?

Thanks!

felipeek
  • 1,193
  • 2
  • 10
  • 31

2 Answers2

3

You haven't enabled optimisation, so GCC emits function prologues without regard to if they are useful in the function in question.

To see the result of get_pc_thunk used access a global variable.

To remove the useless calls to get_pc_thunk enable optimisation for example by adding -O2 to the GCC command line.

Timothy Baldwin
  • 3,551
  • 1
  • 14
  • 23
  • The OP knows that `get_pc_thunk` is because of PIE being the default. But for other future readers: build with `-fno-PIE -no-pie` because [32-bit PIC/PIE is nasty and slow](https://stackoverflow.com/questions/43367427/32-bit-absolute-addresses-no-longer-allowed-in-x86-64-linux) and harder than necessary to understand. Or just build 64-bit code: RIP-relative addressing avoids the need for PIC/PIE code to find its own address via `call`. – Peter Cordes Jul 09 '20 at 19:35
  • @PeterCordes I'm not sure I'd agree to build with `-fno-PIE -no-pie`, unless you mean purely to look at the disassembly of and not to actually use. It's a really important security feature even if it is slow. – Joseph Sible-Reinstate Monica Jul 10 '20 at 00:20
  • 1
    Thanks for the answer. If I use `-O2`, the call goes away as you said. If, however, I move the `aux()` function to another compilation unit, the `get_pc_thunk` function remains being called, even with `-O2`, and, again, its return value is being ignored. Any clue? – felipeek Jul 10 '20 at 03:14
  • @felipeek: Did you compile *both* compilation units with `-O2`? Unless it needs to access *data* with static storage (e.g. a global variable), it doesn't need a GOT pointer so it doesn't need to `get_pc`. – Peter Cordes Jul 10 '20 at 03:55
  • @PeterCordes yes, I'm running `gcc -o main main.c aux.c -fpie -nostdlib -m32 -O2` (the only difference is that the aux() function was moved to aux.c). The generated binary is still setting up the GOT pointer before calling `aux()`, but the used `call` instruction is relative and thus the GOT pointer is ignored – felipeek Jul 10 '20 at 04:07
  • @felipeek: Ok yes, https://godbolt.org/z/Yere9o shows that effect. IIRC, EBX=GOT is maybe it's assumed/required by the PLT itself, and the call has to go through the PLT because it's not *known* when compiling this compilation unit that an `aux` definition will be *statically* linked with it. Possibly with a "hidden" ELF visibility attribute we could get that to go away. – Peter Cordes Jul 10 '20 at 04:11
  • @PeterCordes IMO it makes sense to have the call in object files generated for compilation units that are calling external functions, but I don't understand why the linker (or the 'flow') is not removing them in the final binary. Are we saying that all calls to other compilation units will invoke a totally unnecessary call in the final binary when PIE is required? – felipeek Jul 10 '20 at 04:21
  • @felipeek: Good question. The linker doesn't know when it can relax a `call foo@plt` to `call foo` because that also disables symbol interposition. Even if there is a definition of `foo` in this ELF shared object, a definition in one loaded earlier could override it / take precedence. I think this "problem" is due to the fact that PIE executables evolved out of a kind of hack: put an entry point in a shared object and the dynamic linker will be willing to run it. i.e. at an ELF level, PIE executables are the same as `.so`, and `-fpie` and `-fPIC` look the same to the linker. – Peter Cordes Jul 10 '20 at 04:25
  • @felipeek: The linker can go the other way, though: if making a normal non-PIE executable (ELF type = EXEC), it can turn `call foo` into `call foo@plt`, but that PLT itself doesn't have to be PIE/PIC so it doesn't need EBX=GOT. – Peter Cordes Jul 10 '20 at 04:26
1

If, however, I move the aux() function to another compilation unit, the get_pc_thunk function remains being called, even with -O2, and, again, its return value is being ignored.

IIRC, EBX=GOT point is assumed/required by the PLT itself, and the call has to go through the PLT because it's not known when compiling this compilation unit that an aux definition will be statically linked with it. (https://godbolt.org/z/Yere9o shows that effect for main with just a prototype for aux(), not a definition it can inline.)

With a "hidden" ELF visibility attribute, we can get that to go away because the compiler knows it doesn't need to indirect through the PLT because a call rel32 will be known at static link time without needing runtime relocation: https://godbolt.org/z/73dGKq

__attribute__((visibility("hidden"))) int aux(void);
int _start(){
    int a = aux();
    return a;
}

gcc10.1 -O2 -m32 -fpie

_start:
        jmp     aux

IMO it makes sense to have the call in object files generated for compilation units that are calling external functions, but I don't understand why the linker (or the 'flow') is not removing them in the final binary.

@felipeek: Good question. The linker doesn't know when it can relax a call foo@plt to call foo because that also disables symbol interposition. Even if there is a definition of foo in this ELF shared object, a definition in one loaded earlier could override it / take precedence. I think this "problem" is due to the fact that PIE executables evolved out of a kind of hack: put an entry point in a shared object and the dynamic linker will be willing to run it. i.e. at an ELF level, PIE executables are the same as .so, and -fpie and -fPIC look the same to the linker.

The linker can go the other way, though: if making a normal non-PIE executable (ELF type = EXEC), it can turn call foo into call foo@plt, but that PLT itself doesn't have to be PIE/PIC so it doesn't need EBX=GOT.

Are we saying that all calls to other compilation units will invoke a totally unnecessary call in the final binary when PIE is required?

No, only ones in 32-bit PIE code where you fail to tell the compiler that it's an "internal" symbol using ELF "hidden" visibility. You can even have 2 names for the same symbol, one with hidden visibility, so you can make a function that libraries can resolve by name, but that you can still call from within the executable using simple call rel32 instead of clunky indirect calls via the PLT.

This is one of the downsides of PIE. Even in 64-bit code, without the attribute you get jmp aux@PLT. (Or with -fno-plt, an indirect call using RIP-relative addressing for the GOT entry.)

32-bit PIE really sucks a lot for performance, like on average 15% (measured a while ago on CPUs at the time, could possibly be somewhat different.) Much smaller effect on x86-64 where RIP-relative addressing is available, like a couple %. 32-bit absolute addresses no longer allowed in x86-64 Linux? has some links to more details.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks for all the clarification. I'm curious when you say "Even if there is a definition of foo in this ELF shared object, a definition in one loaded earlier could override it / take precedence." AFAIK, and correct me if I am wrong, functions declared by the final application always take precedence over SO functions. So, theoretically, in my case, since the linker has the knowledge that my application has an entry point, couldn't it call directly my `aux` func? – felipeek Jul 10 '20 at 04:40
  • 1
    @felipeek: I wasn't sure if `LD_PRELOAD` could do symbol interposition on symbols in the main executable or not. Maybe not. But anyway, what if something links to this shared object as a library, *instead* of running this ELF shared object as an executable? An entry-point in an ELF shared object might just be there to run self tests for the functions it contains. (Although at this point, I think `ld` has a `-pie` option so it could "know" it's making a PIE and relax `foo@plt` to `foo` accordingly. So it might be fairly simple to fix that missed optimization. IDK, not a linking expert.) – Peter Cordes Jul 10 '20 at 04:45