0

For a shared library file, how to convert between the file offset and virtual address of the definition of a symbol?

In ELF document, for a symbol in a symbol table,

In executable and shared object files, st_value holds a virtual address. To make these files' symbols more useful for the dynamic linker, the section offset (file interpretation) gives way to a virtual address (memory interpretation) for which the seciton number is irrelevant.

But how can I get the according offset in the file? Or given an offset, how can I calculate the virtual address(file interpretation to memory interpretation)?

Imagine a scenario like this. During the execution of a process, suppose it is using a function implemented in a shared library, say libx.so, and that the library file is mapped into a region represented by vma.

//addr holds the value of PC
offset = (vma->vm_pgoff << PAGE_SIZE) + addr -vma->vm_start;

As I understand it, now offset holds the offset of the instruction in the library file. Given this offset, I'd like to know the function name. One way is to calculate the the virtual address corresponding to offset, and compare the virtual address with the st_values in the symbol table. If st_values are processed to be stored in ascending order, then st_value_1 < virtual_address < st_value_2 means st_name_1 is what I'm looking for. So the problem lies in the conversion.
For reference, data structure of a symbol table entry is:

typedef struct{
  Elf32_Word     st_name; 
  Elf32_Addr     st_value;
  Elf32_Word     st_size;
  unsigned char  st_info;
  unsigned char  st_other;
  Elf32_Half     st_shndx;
}Elf32_Sym;
dudu
  • 801
  • 1
  • 10
  • 32
  • Are you aware of GOT and PLT ? Not sure i understand your exact question, but i believe lecture of [htwsl](https://software.intel.com/sites/default/files/m/a/1/e/dsohowto.pdf) can help you understand some of the concepts that are missing here, – Tomasz Andel Oct 12 '17 at 08:46
  • "If st_values are processed to be stored in ascending order, then st_value_1 < virtual_address < st_value_2 means st_name_1 is what I'm looking for.": You should use the symbol length to know the size of the function. There might be other things (functions not present in the suymbol table) between st_value1 and st_value2.* – ysdx Oct 17 '17 at 07:31
  • @ysdx According to ELF documentation, for data objects, st_size is the number of bytes contained in the object. But I think that rule does not hold for functions. For symbols that are functions, st_size is not reliable and I cannot find a way to get the function's size. – dudu Oct 17 '17 at 08:54
  • Yes indeed, it's not clearly indicated what the st_size might mean for a STT_FUNC symbol. And it's not mandated to have a non zero value. – ysdx Oct 17 '17 at 09:14

1 Answers1

0

The program header tables PT_LOAD entries define how the loader/linker is expected to map parts of the ELF file in the virtual address space. You should use this if you want to convert between file offset and (relative) virtual memory addresses:

~$ readelf -l /lib/i386-linux-gnu/libc-2.24.so 

Elf file type is DYN (Shared object file)
Entry point 0x18400
There are 10 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00000034 0x00000034 0x00140 0x00140 R E 0x4
  INTERP         0x166374 0x00166374 0x00166374 0x00013 0x00013 R   0x4
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x00000000 0x00000000 0x1b01c8 0x1b01c8 R E 0x1000
  LOAD           0x1b0260 0x001b1260 0x001b1260 0x02c74 0x0579c RW  0x1000
  DYNAMIC        0x1b1db0 0x001b2db0 0x001b2db0 0x000f0 0x000f0 RW  0x4
  NOTE           0x000174 0x00000174 0x00000174 0x00044 0x00044 R   0x4
  TLS            0x1b0260 0x001b1260 0x001b1260 0x00008 0x00048 R   0x4
  GNU_EH_FRAME   0x166388 0x00166388 0x00166388 0x061ec 0x061ec R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
  GNU_RELRO      0x1b0260 0x001b1260 0x001b1260 0x01da0 0x01da0 R   0x1

For example, considering this symbol

   Num:    Value  Size Type    Bind   Vis      Ndx Name
   188: 0005df80    35 FUNC    GLOBAL DEFAULT   13 fopen@@GLIBC_2.1

It's (relative) virtual address is 0x0005df80. It belongs to the first PT_LOAD entry which ranges in relative virtual memory from 0x00000000 to 0x00000000 + 0x1b01c8. It's offset within the segment is Value - VirtAddr = 0x00000000. It's offset within the file is thus PhysAddr + (Value - VirtAddr) = 0005df80.

ysdx
  • 8,889
  • 1
  • 38
  • 51