10

I normally don't spend much time reading assembly, so the following compiler output confused me a little.

Say I compile this piece of C code on my Intel Core 2 Duo running OSX 10.6:

while (var != 69) // var is a global variable
{
    printf("Looping!\n");
}

The assembly for the "var != 69" comparison looks like:

cmpl    $69, _var(%rip)

I understand that it effectively means to compare the value "69" against the contents of the global variable "var", but I'm having a tough time understanding the "_var(%rip)" part. Normally, I expect there to be a offset value, like for referring to local variables in the stack (eg: -4($ebp)). However, I don't quite following how offsetting the instruction pointer with the "_var" declaration will give me the contents of the global variable "var".

What exactly does that line mean?

Thanks.

lhumongous
  • 1,064
  • 12
  • 27

1 Answers1

14

This works very nearly the same as addressing local variables in the stack with offset(%ebp). In this case, the linker will set the offset field of that instruction to the difference between the address of var, and the value that %rip will have when that instruction executes. (If I remember correctly, that value is the address of the next instruction, because %rip always points to the instruction after the one currently executing.) The addition thus gives the address of var.

Why do it this way? This is a hallmark of position-independent code. If the compiler had generated

cmpl $69, _var

and the linker had filled in the absolute address of var, then when you ran the program, the executable image would always have to be loaded into memory at one specific address, so that all the variables had the absolute addresses that the code expects. By doing it this way, the only thing that has to be fixed is the distance between the code and the data; the code plus data (i.e. the complete executable image) can be loaded at any address and it'll still work.

... Why bother? Why is it bad to have to load an executable at one specific address? It isn't, necessarily. Shared libraries have to be position-independent, because otherwise you might have two libraries that wanted to be loaded at overlapping addresses and you couldn't use both of them in the same program. (Some systems have dealt with this by keeping a global registry of all libraries and the space they require, but obviously this does not scale.) Making executables position-independent is largely done as a security measure: it's somewhat harder to exploit a buffer overflow if you don't know where the program's code is in memory (this is called address space layout randomization).

zwol
  • 135,547
  • 38
  • 252
  • 361
  • Excellent answer and some good links! Disassembling this in gdb helped illustrate this, too; that line of code became "cmpl $0x45, 0x1ed(%rip) ", which offsets to the "var" variable. Thanks. – lhumongous Jun 27 '11 at 03:56
  • 1
    One thing to note: the notation `var(%rip)` is actually rather misleading; it's chosen for terseness rather than expressiveness. The more consistent notation would have been `[var-.](%rip)` and in fact this kind of thing is used in x86 (32-bit) PIC asm. – R.. GitHub STOP HELPING ICE Jun 27 '11 at 04:22
  • 2
    Another thing to note is that on x86/x64 at least, _absolute_ addressing (i.e. `cmpl $69, _var`) requires the operand (`_var`) to fit into a 32bit quantity. That's due to the instruction encoding used in the architecture. When AMD devised the 64bit extension, they kept the instruction encoding format but made all 32bit "absolute" (which actually are encoded as "32bit relative _without base/index register_") into `%rip`-relative. This creates an easy way to place both code and data into a 64bit address space. – FrankH. Jun 27 '11 at 12:20