1

I am curious how programs like readelf, objdump and gdb know what to display next to callq instructions. Since the program has yet to run how do they know how far to 'fall through' the .plt? Do they guess based on the arguments passed to it? Or do they actually do a mock run of the program to find out?

For example:

  400ca4:       e8 e7 fb ff ff          callq  400890 <printf@plt>
  400ca9:       48 8b 85 28 ff ff ff    mov    -0xd8(%rbp),%rax

The above code knows to go to printf() in the .plt at 0x400890:

0000000000400890 <printf@plt>:
  400890:       ff 25 ba 17 20 00       jmpq   *0x2017ba(%rip)        # 602050 <_GLOBAL_OFFSET_TA$
  400896:       68 07 00 00 00          pushq  $0x7
  40089b:       e9 70 ff ff ff          jmpq   400810 <_init+0x20>

This is just output from objdump -d so I'm not sure how the program knows it wants printf. The only correlation I can see is the relocation index (pushq $0x7) and the section .dynsym, though it is one value off because it starts at 0:

8: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)

Another thing that confuses me is the reference to the GOT in the .plt entry (#602050). I see from readelf that it is part of .got.plt based on the address range, but how do these programs determine the value before the program is run?

[23] .got.plt          PROGBITS         0000000000602000  00002000
       00000000000000b8  0000000000000008  WA       0     0     8

** Edit **

Symbol table '.dynsym' contains 22 entries:

       Num:    Value          Size Type    Bind   Vis      Ndx Name
         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
         1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND free@GLIBC_2.2.5 (2)
         2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND putchar@GLIBC_2.2.5 (2)
         3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strncpy@GLIBC_2.2.5 (2)
         4: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.2.5 (2)
         5: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fclose@GLIBC_2.2.5 (2)
         6: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND strlen@GLIBC_2.2.5 (2)
         7: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __stack_chk_fail@GLIBC_2.4 (3)
         8: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (2)
         9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)
        10: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND ftell@GLIBC_2.2.5 (2)
        11: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
        12: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND malloc@GLIBC_2.2.5 (2)
        13: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _IO_getc@GLIBC_2.2.5 (2)
        14: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fseek@GLIBC_2.2.5 (2)
        15: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fopen@GLIBC_2.2.5 (2)
        16: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND perror@GLIBC_2.2.5 (2)
        17: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND getopt@GLIBC_2.2.5 (2)
        18: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND atoi@GLIBC_2.2.5 (2)
        19: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND exit@GLIBC_2.2.5 (2)
        20: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND fwrite@GLIBC_2.2.5 (2)
        21: 0000000000400a4d    34 FUNC    GLOBAL DEFAULT   13 err
Owen M
  • 13
  • 3

1 Answers1

0

A little of this is going off of memory, but let's see if I can't help you out...

As to your first question, there's a chain of things that link together. I can't guarantee this is how these tools are doing things, but just to show that there is a way.

  1. The PLT has a 1-to-1 correspondence (except for PLT[0], which is special) with a .rel(a).plt section. This section contains relocations for the PLT entries.
  2. Each .rel(a).plt entry has an info field which has a symbol table index, e.g. into .dynsym.
  3. Each symbol table entry has an offset into the string table (e.g. .dynstr) for its name. This offset is a byte offset starting from the beginning of the string section.

So as you can see, you can follow the PLT to the rel(a).plt, to the symbol table, to the string table, where you'll find "printf."

To answer your second question, take a look at the program headers (readelf -Wl <program>), and you'll see the virtual addresses for the different sections. That's where that address range comes from.

Dan Fego
  • 13,644
  • 6
  • 48
  • 59
  • Thanks for the info. Just to clarify, readelf --relocs shows the info field with values like 000100000007 where the 4th digit increases by one as expected. Assuming this is the index, why does readelf -s (shows .symtab) display UND as index for all the functions? Under .symtab the functions are located about 45 entries in, whereas in .dynsym they are where they are expected to be but still with an index of UND. – Owen M Oct 02 '14 at 18:09
  • Since posting the first comment I've learned that `.rela.plt` has a sh_link value for `.dynsym` and I need to use the ELF64_R_SYM() macro to obtain/extract the index numbers. After doing this, the relationship between the two sections is clear, however I still do not see how these numbers correspond to `.dynstr`. The index numbers in `.dynsym` are listed as UND, and the ordering does not match up with the entries in the string table. – Owen M Oct 02 '14 at 21:17
  • @OwenM The symbols are UND because they're not _defined_ in the given binary -- they're defined in some other library (libc in this case). As for the strings in .dynsym matching up, the string table index is a byte offset from the beginning of .dynstr, not an index to the number of the string, so that should match up. – Dan Fego Oct 02 '14 at 21:46
  • Sorry for not grasping this straight away, but where is this byte offset found? According to http://geezer.osdevbrasil.net/osd/exec/elf.txt the byte offset is the name, but that doesn't make sense as output from `objdump -s binary` shows that `.dynsym` has no string data. – Owen M Oct 02 '14 at 22:10
  • @OwenM The offset that is *found* in .dynsym is a byte offset _into_ .dynstr. – Dan Fego Oct 02 '14 at 22:45
  • I have edited the original post. I must be blind, because I cannot see anything to do with an offset in that output. – Owen M Oct 02 '14 at 22:52
  • @OwenM readelf uses that information to show the name. If you take a look at the Elf64_Sym struct in /usr/include/elf.h, you can see the first member is st_name, which is commented as being "Symbol name (string tbl index)" – Dan Fego Oct 02 '14 at 22:54