Step 4:
Some opcodes reference memory that's relative to the opcode's location. For example, a function might have a constant or static piece of data. If it does, the code may opt to place that right before the function starts (or right after it ends) and refer to it by saying "get the memory from 46 bytes earlier". That's the displacement -- it's an offset from the contents of a register (in this case, EIP), used for referencing data relative to the register's contents.
Step 5
The operands for opcodes are normally stored right after the opcode. So you might have some memory arranged like so: a b c
. a
is and opcode, b
is the operand for a
and c
is the next opcode.
If you only move EIP to the end of a
(so it references b
), then in the next instruction cycle, the computer will assume that b
is the next opcode to execute. b
isn't supposed to be an opcode though; it's an operand. The computer can't tell the difference between an opcode and an operand though. It just assumes whatever EIP points to is an instruction and executes it. That's why EIP needs to be moved past the operand too.
Step 6
An "effective" address is just an absolute one (relative to the start of memory) while the "complex" address the book refers to is relative to something else (often the contents of a register).
Step 4 showed that an opcode might not refer to an absolute memory address. It could easily refer to a relative one. In fact, programs very frequently refer to addresses that are relative to some register. For example, if you wrote some_struct.data
in C and compiled it for an x86 processor, it would load the address of some_struct
into a register (say, EAX), then hard-code data
's offset from the base of some_struct
into the operand. So if there are 5 bytes of data between the start of the struct and the start of the data
element, then the instruction might look like load [EAX + 5] -> EBX
which means "take what's in EAX, add 5, fetch the data from that address and put it in EBX".
The thing is, the memory doesn't really understand relative addresses like this. It only understands absolute ones. So in order to access a relative address, the processor has to first add that 5 to whatever's in EAX to compute an absolute address. Then it can send that address to the memory controller and have it understood.
There are two basic types of relative addresses I've worked with (there are more I haven't).
- Register relative: The processor takes the contents of a register and uses that as the address in memory. Depending on the opcode and processor support, it may also add an operand to the register as well. Step 4 was dealing with this kind of addressing, with EIP as the register the address was relative to.
- Memory relative: Sometimes referred to as "indirect". The processor starts out with a register relative address, then automatically fetches the data at that address and treats it as the real address.
- Wikipedia describes lots of other addressing modes on their addressing modes page.
Memory relative took me a while to understand. Say you did a memory relative load where the register contains 10
and the offset is 5
. The processor will add them together (10 + 5 = 15
). Then, it'll go to that address (15
in this case) and grab whatever's there. If address 15
happens to contain the value 60
, then 60
will be treated as the actual address and the processor will load the contents of address 60
. If you're familiar with a language with pointers (e.g. C), memory relative is like a pointer-to-a-pointer.