There is a lot of complexity that goes into any sort of complete answer here. For MIPS assembly, though, we may [see comments below] get a bit of a break.
We will need to consider addressing modes and the concept of relative addressing vs absolute addressing. This is because, as zwol mentioned in a comment, the outputs of compilers and assemblers are generally not actually ready-to-run code, but rather are object files, full of instructions that get interpreted by a linker and/or a loader.
A linker is a program that takes multiple object files and combines them into a more-complete program. This may take the form of another object file, or a library that's essentially a collection of object files. If the library format is simple enough, the library might be built simply by aggregating object files, with the option of adding a table of contents, but sometimes you want to do a certain amount of pre-linking, to connect particular object files together into an unbreakable unit, for later linking against more object files or libraries. Linkers can be quite complicated as they may have to deal with symbolic names (function and variable names) and provide information for debuggers (symbol tables, memory-region descriptions, and so on).
A loader takes object files that have often been at least partially resolved by a linker, sometimes completely resolved, and loads that into memory. Some loaders are themselves linkers, of a type usually referred-to as a runtime linker or runtime loader. This allows executable object files to load other object files at run-time, rather than pre-linking everything in advance.
One way or another, though, it's generally the load-time operation that assigns actual addresses to code and data. The object file may contain instructions that say that the code can run anywhere, or that the code must run at some particular (fixed) address. The same rules may apply to data. If a fixed address is required, it's possible that this address is not available, so relocatable code—code that can be moved from some sort of default address to another different address—is often desirable.
This leads to the concept of relative addressing. Suppose a machine works by repeatedly executing some very simple steps:
- Load instruction from address given by IP (Instruction Pointer) or PC (Program Counter) register.
- Increment this register by some constant, such as 4.
- Execute the instruction just loaded.
A branch instruction consists of a directive to change the IP/PC register, either to some new value, or by adding or subtracting some value.
Now, suppose that the executable object file recommends that the program be loaded at address 0x04000000
, for instance. Suppose further that the tenth instruction—which will be at address 0x04000028
—is a branch instruction, and that it needs to set things up so that the next instruction will be loaded from 0x0400000c
, i.e., the third instruction:
04000000 instruction#0
04000004 instruction#1
04000008 instruction#2
0400000c loop: instruction#3
04000010 #4
04000014 #5
04000018 #6
0400001c #7
04000020 #8
04000024 #9
04000028 j loop
0400002c
Given our model above, the IP or PC register will, during the execution of instruction #10, the j loop
that jumps to instruction #3, hold the value 0400002c
, because we described the operation as "load, increment-by-4, execute".
If we need to use absolute addressing, we need the actual j loop
instruction to stuff the literal value 0400000c
directly into the instruction-pointer register. However, it may only be the loader that knows whether the program is really running at 04000000
. If that address was in use, the loader may have moved the program to 08000000
instead, and the value to shove into the i-p register is now 0800000c
instead.
If we are using relative addressing, however, the j loop
instruction needs to assemble to machine code that says, not "go to 0400000c
", but rather "go forward or backwards from where we are now, 0400002c
, to where we want to be at 0400000c
". That's obviously a backwards leap, by 0400002c - 0400000c
or 20 (hexadecimal, 32 decimal) bytes, or eight instructions' worth.
Edit: See comments below, this next part was wrong—I was relying on the other StackOverflow answer and the web page I cite for assuming PC-relative jumps. I have updated this to use absolute addressing for j
instructions.
MIPS processors use a register called pc
(but difficult to access), and support relative addressing in conditional branches (e.g., beq
; see Assembly PC Relative Addressing Mode). Hence some of the complexities could vanish: we need only instruct the CPU to jump backwards eight instructions, i.e., to add negative-eight to the PC register. The CPU automatically multiplies this value by 4, so that it adds negative-32. If we were really loaded at 04000000
, pc
will be 0400002c
and moving it back this much changes it to 0400000c
, which is what we want. If we were really loaded at 08000000
instead, the same relative move lands us at 0800000c
, which is what we want.
This would be the case if we were using b
instructions. But j
instructions are absolute within a 256 MB region: they simply overwrite the low 28 bits of the program counter.
Generally, we'll have an assembler output our absolute jump
instruction with a relocation type that tells any runtime loader: add any load-time offset needed. So we just need to make sure that, as we assemble, we know where we intend to be loaded—whether that's just 0
, or 04000000
, or whatever—and we'll emit, for a j
instruction, the absolute address of the target instruction, but also some additional linker/loader instructions that say: The constant in this instruction may need adjustment at link or load time. Note that the linker and loader must be smart enough to understand addressing constraints: it's not OK to move the program so that what used to fit within one 256 MB region, now spans two such regions, if the code segment uses j
instructions to jump within the one region.
(Web site https://en.wikibooks.org/wiki/MIPS_Assembly/MIPS_Details claims that j
instructions are relative, but this appears to be wrong; see comments.)
(Note that negative numbers are represented as two's complement. Since the j
instruction takes a 26-bit relative address that it automatically multiplies by 4 for you, it can represent a 28-bit address range, from -227 to 227-1, or -08000000..07fffffc
, in steps of 4.)