3

In Intel's PinTool, you can print out the "instruction address" of every instruction in a program by using either IARG_INST_PTR or INS_Address. I've observed that running the same program at different point in time produces different instruction addresses for the exact same instructions. However, I would expect the addresses remain the same across runs. What's the root cause of this variation? I've attached two sample outputs below which show the OPCODE and instruction address of the first three instructions that are executed.

How would I find the PC of each instruction? Or the addresses as shown in OBJDUMP through PinTool?

--RUN1--

op:       MOV addr:0x00007fac87a8d2d0

op: CALL_NEAR addr:0x00007fac87a8d2d3

op:      PUSH addr:0x00007fac87a90a70

--RUN2--

op:       MOV addr:0x00007fc529f402d0

op: CALL_NEAR addr:0x00007fc529f402d3

op:      PUSH addr:0x00007fc529f43a70
Minar Mahmud
  • 2,577
  • 6
  • 20
  • 32
kc2uno
  • 1,141
  • 2
  • 12
  • 16
  • 1
    Unless you are running your program on some embedded microcontroller where it is **the only thing running**, then yes, the instruction pointer address **should** change between runs. On a normal computer, there are a myriad of programs starting/stopping all the time that effect where in memory your code is executed. – David C. Rankin Mar 27 '15 at 02:31
  • 1
    @david-c-rankin That doesn't seem like the right explanation. Your statement is true for *physical* instruction addresses but not for *virtual* instruction addresses. PinTool prints out the virtual addresses. – kaylum Mar 27 '15 at 02:51
  • 4
    My guess would be Address Space Layout Randomisation (ASLR). https://en.wikipedia.org/wiki/Address_space_layout_randomization – kaylum Mar 27 '15 at 02:52
  • Also gonna go with ASLR. – Ron Thompson Mar 27 '15 at 03:51
  • I added a (clumsy) way to correlate addresses from libraries to my answer. Sorry for being a bit rambling. – Ulfalizer Mar 27 '15 at 05:41

2 Answers2

4

(tl;dr version with a possible solution at the end.)

It's almost certainly due to address space randomization applied to shared libraries. Running the following command a few times will let you see how it works:

$ cat /proc/self/maps

/proc/self/ refers to the current process (the one opening the file). There's are also /proc/<pid>/ directories for specific PIDs. The maps file lists the mappings for the process -- in this case for the cat process itself.

Here's the output for one run on my system:

00400000-0040c000 r-xp 00000000 08:01 3409248            /bin/cat
0060b000-0060c000 r--p 0000b000 08:01 3409248            /bin/cat
0060c000-0060d000 rw-p 0000c000 08:01 3409248            /bin/cat
0063a000-0065b000 rw-p 00000000 00:00 0                  [heap]
7f017ef95000-7f017f761000 r--p 00000000 08:01 8126750    /usr/lib/locale/locale-archive
7f017f761000-7f017f91b000 r-xp 00000000 08:01 11155466   /lib/x86_64-linux-gnu/libc-2.19.so
7f017f91b000-7f017fb1a000 ---p 001ba000 08:01 11155466   /lib/x86_64-linux-gnu/libc-2.19.so
7f017fb1a000-7f017fb1e000 r--p 001b9000 08:01 11155466   /lib/x86_64-linux-gnu/libc-2.19.so
7f017fb1e000-7f017fb20000 rw-p 001bd000 08:01 11155466   /lib/x86_64-linux-gnu/libc-2.19.so
7f017fb20000-7f017fb25000 rw-p 00000000 00:00 0 
7f017fb25000-7f017fb48000 r-xp 00000000 08:01 11155454   /lib/x86_64-linux-gnu/ld-2.19.so
7f017fd1c000-7f017fd1f000 rw-p 00000000 00:00 0 
7f017fd23000-7f017fd47000 rw-p 00000000 00:00 0 
7f017fd47000-7f017fd48000 r--p 00022000 08:01 11155454   /lib/x86_64-linux-gnu/ld-2.19.so
7f017fd48000-7f017fd49000 rw-p 00023000 08:01 11155454   /lib/x86_64-linux-gnu/ld-2.19.so
7f017fd49000-7f017fd4a000 rw-p 00000000 00:00 0 
7fffacef5000-7fffacf16000 rw-p 00000000 00:00 0          [stack]
7fffacf5a000-7fffacf5c000 r-xp 00000000 00:00 0          [vdso]
7fffacf5c000-7fffacf5e000 r--p 00000000 00:00 0          [vvar]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0  [vsyscall]

The top three lines are the code segment, read-only data segment, and read-write data segment from your executable. The remaining lines are the stack, heap, various segments for shared libraries, memory-mapped files (as a side note, libraries are just memory-mapped files too), and some internal stuff related to how certain system calls are implemented.

If you repeat the command a few times, you will likely see all mappings except for the code and data segments from the executable move around randomly. This is a security measure. Not knowing where things are in memory makes certain exploits harder to pull off, as you can't just directly jump to some address you know will have a useful routine for example.

The main reason why address space randomization is not applied to the code and data segments from the executable itself is probably efficiency. Code that isn't loaded to a fixed address has to be position-independent, which adds some overhead. This is why shared libraries need to be explicitly compiled with -fPIC.

(Shared libraries also need to be position-independent for other reasons besides security. Using a fixed address for each library would cause problems if two libraries happened to get overlapping load addresses.)

Unfortunately I'm not familiar with PinTool. I believe GDB simply disables address space randomization (using the personality(2) system call) to get predictable addresses for shared libraries.

Address space randomization can be turned off for a single shell session (this seems to use personality() under the hood too), or globally by doing echo 0 > /proc/sys/kernel/randomize_va_space (see the /proc/sys/ documentation).

I found the following on this page. Might be relevant.

Does Pin change the application code and data addresses?

...

Note: Recent linux kernels intentionally move the location of stack and dynamically allocated data from run to run, even if you are not using pin. On RedHat-based systems you can workaround this by running Pin as follows:

$ setarch i386 pin -t pintool -- app

tl;dr answer

If all you need to do is to correlate addresses from PinTool that happen to come from libraries to objdump disassembly addresses, and you don't mind doing some manual work each time, then the following should work:

  1. Print /proc/maps from your process. (You could also run it in the background and print /proc/<pid>/maps from the shell, using e.g. $! to get the PID.)

  2. Check which mapping the address belongs to. In the library case, it'll probably be the text segment of some library (marked r-xp in the /proc/maps).

  3. Subtract the start address of the mapping from the address you see in PinTool.

This will get you the addresses you see in an objdump disassembly (when you run it on that same library). You could use addr2line(1) to get the source line too if the library has debugging information.

There might be nicer workflows of course. This worked for me when playing around a bit with dlopen(3) and dlsym(3) at least. Core dumps should contain library load addresses, so maybe that could be used somehow...

Ulfalizer
  • 4,664
  • 1
  • 21
  • 30
0

Because the program was loaded into a different memory address.

Programs are compiled to (for some parts) use a fixed memory layout; however, they don't know which addresses they are going to use because they can't be compiled with knowledge of which memory addresses are going to be free when the computer runs.

So their internal addresses are really "offsets" from the "start of program" memory address. As the program gets copied into ram, the "starting" memory address gets added to all the offset addresses.

This also accounts for why the low order bits are consistent, even when the high order bits are not. You are looking at the same block of code twice, loaded into different starting memory addresses. This means the offsets are identical, but the addresses are not.

Edwin Buck
  • 69,361
  • 7
  • 100
  • 138