1

I am tracing wireshark-2.6.10 using Pin. At several points during the initialization, I can see some calls, such as this:

00000000004e9400 <__libc_csu_init@@Base>:
  ...
  4e9449:       41 ff 14 dc             callq  *(%r12,%rbx,8)
  ...

The target of this call is 0x197db0, shown here:

0000000000197cb0 <_start@@Base>:
  ...
  197db0:       55                      push   %rbp
  197db1:       48 89 e5                mov    %rsp,%rbp
  197db4:       5d                      pop    %rbp
  197db5:       e9 66 ff ff ff          jmpq   197d20 <_start@@Base+0x70>
  197dba:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  ...

Pin says that this is in the middle of the containing routine, i.e., _start@@Base. But, when I reach this target using gdb, I see the following output:

>│0x5555556ebdb0 <frame_dummy>                                    push   %rbp
 │0x5555556ebdb1 <frame_dummy+1>                                  mov    %rsp,%rbp
 │0x5555556ebdb4 <frame_dummy+4>                                  pop    %rbp
 │0x5555556ebdb5 <frame_dummy+5>                                  jmpq   0x5555556ebd20 <register_tm_clones>
 │0x5555556ebdba <frame_dummy+10>                                 nopw   0x0(%rax,%rax,1)
 │0x5555556ebdc0 <main_window_update()>                           xor    %edi,%edi

Note that if I subtract the bias value, the runtime target address will be consistent with the compile time value (i.e., 0x5555556ebdb0 - 0x555555554000 = 0x197db0). It seems that there exists a pseudo-routine called frame_dummy inside _start@@Base. How is that possible? How can I extract the addresses for these pseudo-routines, beforehand (i.e., before execution)?

UPDATE:

These types of calls to the middle of functions were not present in GIMP and Anjuta (which are written almost purely in C and built from source). But are present in Inkscape and Wireshark (written in C++, although I do not think that the language is the cause. These two were installed from packages.).

At first, it seemed that this situation occurs only during the initialization and before calling the main() function. But, at least in wireshark-2.6.10 this occurs at least in one place after main() starts. Here, we have wireshark-qt.cpp: Lines 522-524 (which is part of main()).

/* Get the compile-time version information string */
comp_info_str = get_compiled_version_info(get_wireshark_qt_compiled_info,
                      get_gui_compiled_info);

This is a call to get_compiled_version_info(). In assembly, the function is called at address 0x5555556e74c2 (0x1934c2 without bias), as shown below:

>│0x5555556e74c2 <main(int, char**)+178>  callq  0x5555556f5870 <get_compiled_version_info>
 │0x5555556e74c7 <main(int, char**)+183>  lea    0x4972(%rip),%rdi        # 0x5555556ebe40 <get_wireshark_runtime_info(_GString*)>
 │0x5555556e74ce <main(int, char**)+190>  mov    %rax,%r13

Again, the target is in the middle of another function, _ZN7QStringD1Ev@@Base:

00000000001980f0 <_ZN7QStringD1Ev@@Base>:
...
1a1870:       41 54                   push   %r12
...

This is the output of gdb (0x5555556f5870 - 0x555555554000 = 0x1a1870):

>│0x5555556f5870 <get_compiled_version_info>      push   %r12
 │0x5555556f5872 <get_compiled_version_info+2>    mov    %rdi,%r12
 │0x5555556f5875 <get_compiled_version_info+5>    push   %rbp
 │0x5555556f5876 <get_compiled_version_info+6>    lea    0x349445(%rip),%rdi        # 0x555555a3ecc2

As can be seen, the debugger recognizes that this address is the start address of get_compiled_version_info(). This is because it has access to debug_info. In all cases that I found, the symbol for these pseudo-routines were removed from the original binary (because .symtab was removed from the binary). But the strange thing is that it is located inside _ZN7QStringD1Ev@@Base. Therefore, Pin considers get_compiled_version_info() to be inside _ZN7QStringD1Ev@@Base.

TheAhmad
  • 810
  • 1
  • 9
  • 21

1 Answers1

1

How is that possible?

The frame_dummy is a bona-fide C function. If Pin thinks it's in the middle of _start, it's probably because:

  1. _start is an assembly function, and
  2. its .st_size is set incorrectly in the symbol table.

You can confirm this by looking at readelf -Ws a.out | egrep ' (_start|frame_dummy)'.

You are probably using the binary linked with fairly old GLIBC.

GLIBC used to generate C runtime startup files (whence _start comes from) by using gcc -S to create assembly from C source, then splitting and editing the assembly with sed. Getting .size directive wrong was one problem with that approach, and it is no longer used on x86_64 as of 2012 (commit).

How can I extract the addresses for these pseudo-routines, beforehand (i.e., before execution)?

Pin doesn't magically create these pseudo-routines, they must be visible in the readelf -Ws output of the original binary.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • Thanks @EmployedRussian. Your hints are as always invaluable. My main problem is that `Pin` does not detect these pseudo-routines (`frame_dummy`,...). `readelf` also does not detect the symbol. I installed the `dbgsym` package and saw that the symbol was there. In any case, having the definition of a routine inside another routine looks particularly strange. Precisely, I need to generate the backtrace in `Pin` and I determine `calls` by comparing the start of `BB` with the start of routines, so this rare case is problematic. But if this happens only before `main()`, I can safely ignore it. – TheAhmad Nov 28 '19 at 23:46
  • Well, I handled the problem using the addresses in the debug info. – TheAhmad Jan 13 '20 at 21:30