0

For some context, I'm inspecting a simple C++ program using the experimental transactional memory model, compiled with g++. I want to know exactly where register_tm_clones is called(you can see the fn by objdumping a simple program). This function will be called even in a program like int main() {}.

I want to know where in the whole scope of a general program where register_tm_clones is called. I set a breakpoint on it in GDB and I backtrace:

Breakpoint 1, 0x00007ffff7c5e6e0 in register_tm_clones () from /usr/lib/libgcc_s.so.1
(gdb) bt
#0  0x00007ffff7c5e6e0 in register_tm_clones () from /usr/lib/libgcc_s.so.1
#1  0x00007ffff7fe209a in call_init.part () from /lib64/ld-linux-x86-64.so.2
#2  0x00007ffff7fe21a1 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#3  0x00007ffff7fd313a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#4  0x0000000000000001 in ?? ()
#5  0x00007fffffffe390 in ?? ()
#6  0x0000000000000000 in ?? ()

It's called when libgcc is opened by ld-linux at some point in the program. I make sure that we're linked with libgcc. Yup:

❯ ldd main
    linux-vdso.so.1 (0x00007fff985e4000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f7eb82dc000)
    libm.so.6 => /usr/lib/libm.so.6 (0x00007f7eb8196000)
    libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f7eb817c000)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007f7eb7fb6000)
    /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f7eb84ec000)

But... How do I know when this is being called (It's definitely not in main)? I know _start is the true entry of the C++ program. and we run __libc_csu_init, and then there's some steps and we get to main. How can I set breakpoints to see in the grand picture to see when ld decided to open libgcc, and consequently where register_tm_clones is called?

OneRaynyDay
  • 3,658
  • 2
  • 23
  • 56
  • Your question is unclear: you _already_ see _exactly_ where it is called. Also, `ld != ld-linux`, please don't mix them up to avoid confusing everyone. – Employed Russian May 14 '20 at 05:33
  • Apologies, I should be more clear about what I'm asking for. I meant I wanted to know exactly, from `_start` to the end of the program, where `register_tm_clones` is called. Also, I guess I'm not familiar with the difference between `ld` and `ld-linux`. Can you explain to me the difference so I can edit the problem statement? I've already updated the question to make the question I'm asking clearer. – OneRaynyDay May 14 '20 at 05:36

1 Answers1

2

How can I set breakpoints to see in the grand picture to see when ld decided to open libgcc, and consequently where register_tm_clones is called?

You already see that.

I think your confusion resides in not understanding what happens when a dynamically linked process runs. Roughly, the steps are:

  1. The kernel creates a new process "shell" and mmaps the executable into it.

  2. The kernel observes that the executable has PT_INTERP segment, and mmaps the file referenced there into the process as well. Here, the contents of PT_INTERP is /lib64/ld-linux-x86-64.so.2, aka dynamic loader, not to be confused with /usr/bin/ld (aka the static linker).

    Further, because there is a program interpreter, the kernel transfers control to it (instead of calling _start in the main executable), because the main executable is not ready to run yet.

  3. When ld-linux starts running, it first relocates itself, then mmaps all the libraries that the main executable directly linked against. You can see these libraries with readelf -d a.out | grep NEEDED.

    Note: since each of these libraries may itself direcly depend on other libraries, this process is repeated recursively.

  4. The libraries are initialized (by calling their constructor function, which is often called _init but can have different name as well) <== this is where libgcc_s.so.1 is initialized, and its register_tm_clones is called.

  5. Once all libraries are loaded and initialized, ld-linux finally calls _start in the main executable, which will eventually call main.

Community
  • 1
  • 1
Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • I see. Thank you for the thorough explanation! I was not sure when the dynamic linker does the whole mmaping. I always thought `_start` was the actual beginning of the program. So if I'm not mistaken(just to summarize): the dynamic linker mmaps all of the needed shared libraries, runs their `_init`'s (or whatever it's called), and then calls `_start` in the main executable, is that correct? – OneRaynyDay May 14 '20 at 06:08
  • @OneRaynyDay Yes, that's correct. `_start` *is* the actual beginning of the program, but `ld-linux` runs for a while (10s of 1000s of instructions) before reaching `_start` in the main executable. To make things more confusing, `ld-linux` has it's own `_start` as well, and that's where the very first user-space instruction executes. Somewhat relevant: https://stackoverflow.com/a/22491581/50617 – Employed Russian May 14 '20 at 06:14
  • I see. Thank you so much! This helped clarify a lot of things, well beyond the scope of this question. I appreciate your help :) – OneRaynyDay May 14 '20 at 06:15