(MIPS Assembler) Can we initialize the Program Counter on our own?

Question

I am still trying to develop a MIPS Assembler as a part of my assignment. And I was given these input and output files.

main:   lw $a0, 0($t0)          
begin:  addi $t0, $zero, 0      # beginning
    addi $t1, $zero, 1
loop:   slt $t2, $a0, $t1       # top of loop
    bne $t2, $zero, finish
    add $t0, $t0, $t1
    addi $t1, $t1, 2
    j loop              # bottom of loop
finish: add $v0, $t0, $zero

and the output should be in machine code as follows:

10001101000001000000000000000000
00100000000010000000000000000000
00100000000010010000000000000001
00000000100010010101000000101010
00010101010000000000000000001000
00000001000010010100000000100000
00100001001010010000000000000010
00001000000000000000000000000011
00000001000000000001000000100000

And I noticed that the machine code that represents the instruction "j loop" is

00001000000000000000000000000011

According to the J-type instruction format, the last 26 bits would then represent the target address. I notice from the binary code I wrote above, the target address of this jump instruction (which is basically the address of "loop") is 00000000000000000000000011 which is 3.

I developed my version of the program and yet the target address it retrieved for "loop" is much larger than that.

I was wondering whether there are any way I could initialize to program counter to 0 or anything so I could get the same target address as the one in the output file I was given. And how exactly does the program counter work? Does it increment itself for every line of code?

Please advise. Thank you!

Yes, it increments automatically when the code is run by the cpu. Since you are writing the assembler, it's up to you maintain a copy of it. It's just the address of the current instruction. You can initialize to anything. No magic involved. As I already said in your other question, the address is shifted by 2, hence it is actually `0x0c` in your case. — Jester, Feb 22 '19 at 15:23
Thanks for your answers. So, the address is **always** shifted by two, right? But how exactly do I initialize it...? Is it as simple as assigning a zero to some variable? I assume that's not the case...? — Tina, Feb 22 '19 at 15:29
Loading the machine code into RAM and initializing the PC to point to the start address is not the assembler's job. In a complete operating system, usually the assembler produces an "object code" file, with annotations on all the jump instructions, called "relocation records", that tells the next stage of processing, "linking", to adjust the addresses to match where the code will actually be placed in memory. This is not a very well-documented process, the only suggestion I have is an entire book, _[Linkers and Loaders](https://www.powells.com/book/-9781558604964)_ by John Levine. — zwol, Feb 22 '19 at 15:29
Your assembler presumably has a variable that maintains the address of instructions. You need that to resolve labels. Initialize that variable to zero and increment it by 4 for every instruction and do whatever is appropriate if you support directives. — Jester, Feb 22 '19 at 15:31
@Jester Just to clarify... that variable is something I define on my own, right? For example, I define it as `start_counter` then basically just increment it by 4 for every instruction in the file that was read? — Tina, Feb 22 '19 at 15:47
Yes that is correct. Whenever you encounter a label, you store the value of that counter as the address of that label and use it to calculate any references. — Jester, Feb 22 '19 at 15:49
@Jester Sorry, I don't know why I am having such a hard time understanding all of this... Since the address of the label it is something I define on my own, can I say that that memory address doesn't actually exist but is only defined so that the development of the assembler could be simpler? I heard somewhere that you can actually develop the assembler without finding all the labels beforehand. — Tina, Feb 22 '19 at 16:05

torek · Answer 1 · 2019-02-23T23:50:04.547

There is a lot of complexity that goes into any sort of complete answer here. For MIPS assembly, though, we may [see comments below] get a bit of a break.

We will need to consider addressing modes and the concept of relative addressing vs absolute addressing. This is because, as zwol mentioned in a comment, the outputs of compilers and assemblers are generally not actually ready-to-run code, but rather are object files, full of instructions that get interpreted by a linker and/or a loader.

A linker is a program that takes multiple object files and combines them into a more-complete program. This may take the form of another object file, or a library that's essentially a collection of object files. If the library format is simple enough, the library might be built simply by aggregating object files, with the option of adding a table of contents, but sometimes you want to do a certain amount of pre-linking, to connect particular object files together into an unbreakable unit, for later linking against more object files or libraries. Linkers can be quite complicated as they may have to deal with symbolic names (function and variable names) and provide information for debuggers (symbol tables, memory-region descriptions, and so on).

A loader takes object files that have often been at least partially resolved by a linker, sometimes completely resolved, and loads that into memory. Some loaders are themselves linkers, of a type usually referred-to as a runtime linker or runtime loader. This allows executable object files to load other object files at run-time, rather than pre-linking everything in advance.

One way or another, though, it's generally the load-time operation that assigns actual addresses to code and data. The object file may contain instructions that say that the code can run anywhere, or that the code must run at some particular (fixed) address. The same rules may apply to data. If a fixed address is required, it's possible that this address is not available, so relocatable code—code that can be moved from some sort of default address to another different address—is often desirable.

This leads to the concept of relative addressing. Suppose a machine works by repeatedly executing some very simple steps:

Load instruction from address given by IP (Instruction Pointer) or PC (Program Counter) register.
Increment this register by some constant, such as 4.
Execute the instruction just loaded.

A branch instruction consists of a directive to change the IP/PC register, either to some new value, or by adding or subtracting some value.

Now, suppose that the executable object file recommends that the program be loaded at address 0x04000000, for instance. Suppose further that the tenth instruction—which will be at address 0x04000028—is a branch instruction, and that it needs to set things up so that the next instruction will be loaded from 0x0400000c, i.e., the third instruction:

04000000       instruction#0
04000004       instruction#1
04000008       instruction#2
0400000c loop: instruction#3
04000010       #4
04000014       #5
04000018       #6
0400001c       #7
04000020       #8
04000024       #9
04000028       j   loop
0400002c

Given our model above, the IP or PC register will, during the execution of instruction #10, the j loop that jumps to instruction #3, hold the value 0400002c, because we described the operation as "load, increment-by-4, execute".

If we need to use absolute addressing, we need the actual j loop instruction to stuff the literal value 0400000c directly into the instruction-pointer register. However, it may only be the loader that knows whether the program is really running at 04000000. If that address was in use, the loader may have moved the program to 08000000 instead, and the value to shove into the i-p register is now 0800000c instead.

If we are using relative addressing, however, the j loop instruction needs to assemble to machine code that says, not "go to 0400000c", but rather "go forward or backwards from where we are now, 0400002c, to where we want to be at 0400000c". That's obviously a backwards leap, by 0400002c - 0400000c or 20 (hexadecimal, 32 decimal) bytes, or eight instructions' worth.

Edit: See comments below, this next part was wrong—I was relying on the other StackOverflow answer and the web page I cite for assuming PC-relative jumps. I have updated this to use absolute addressing for j instructions.

MIPS processors use a register called pc (but difficult to access), and support relative addressing in conditional branches (e.g., beq; see Assembly PC Relative Addressing Mode). Hence some of the complexities could vanish: we need only instruct the CPU to jump backwards eight instructions, i.e., to add negative-eight to the PC register. The CPU automatically multiplies this value by 4, so that it adds negative-32. If we were really loaded at 04000000, pc will be 0400002c and moving it back this much changes it to 0400000c, which is what we want. If we were really loaded at 08000000 instead, the same relative move lands us at 0800000c, which is what we want.

This would be the case if we were using b instructions. But j instructions are absolute within a 256 MB region: they simply overwrite the low 28 bits of the program counter.

Generally, we'll have an assembler output our absolute jump instruction with a relocation type that tells any runtime loader: add any load-time offset needed. So we just need to make sure that, as we assemble, we know where we intend to be loaded—whether that's just 0, or 04000000, or whatever—and we'll emit, for a j instruction, the absolute address of the target instruction, but also some additional linker/loader instructions that say: The constant in this instruction may need adjustment at link or load time. Note that the linker and loader must be smart enough to understand addressing constraints: it's not OK to move the program so that what used to fit within one 256 MB region, now spans two such regions, if the code segment uses j instructions to jump within the one region.

(Web site https://en.wikibooks.org/wiki/MIPS_Assembly/MIPS_Details claims that j instructions are relative, but this appears to be wrong; see comments.)

(Note that negative numbers are represented as two's complement. Since the j instruction takes a 26-bit relative address that it automatically multiplies by 4 for you, it can represent a 28-bit address range, from -2²⁷ to 2²⁷-1, or -08000000..07fffffc, in steps of 4.)

MIPS `j` instructions are not exactly PC-relative. They're absolute within a region (top 4 address bits of PC+4 stay unchanged). [How to Calculate Jump Target Address and Branch Target Address?](//stackoverflow.com/q/6950230). Your examples don't work: without runtime fixups a `j 0x400000c` loaded at `0x800002c` would still jump to `0x400000c`, because those are both within the same 1GB region. If you want a truly relative unconditional branch, you use `beq $zero, $zero, target`, which *does* add an offset to PC. (Alias `b` = branch always). — Peter Cordes, Feb 23 '19 at 23:33
@PeterCordes: I'll buy that (it's actually closer to what I recall from many years ago), but then why does the other answer and web page I linked say they are pc-relative? — torek, Feb 23 '19 at 23:37
Because it's about branches, not jumps. (And unfortunately for you, neither the question nor answer mentioned that jumps aren't like that.) MIPS has 2 different categories of control-transfer instructions. (And with MIPS64r5 or r6 I forget which, also branches and jumps without a branch-delay slot.) — Peter Cordes, Feb 23 '19 at 23:39
Aha! OK, despite the examples I saw, those are actually for conditional branches. — torek, Feb 23 '19 at 23:40
Yes, but remember that you can encode an unconditional relative branch by testing `$zero` == itself. Like I said, the `b` pseudo-instruction does that. This is how you implement jumps in position-independent code. Anyway, to implement an assembler you have to support both. — Peter Cordes, Feb 23 '19 at 23:41
The web page should be updated: it specifically cites `j` instructions. Of course I don't have write access to that web page... — torek, Feb 23 '19 at 23:42
Oh, I had only looked at the other answer. https://en.wikibooks.org/wiki/MIPS_Assembly/MIPS_Details#J_Instructions is just plain wrong when it says "hardcoded offset from the current value of the PC register." But the encoding details are correct. However it's also wrong when it says "The final four bits will be borrowed from the address of the *current* instruction". PC+=4 happens before replacing the low bits, so it's the address of the branch-delay slot that matters. (And yes they're right it's 256-bit regions, not 1G like my first comment said. It's 4GB/2^4 not 4/4.) — Peter Cordes, Feb 23 '19 at 23:48
I submitted an edit on wikibooks https://en.wikibooks.org/w/index.php?title=MIPS_Assembly/MIPS_Details&oldid=3066351&diff=cur&diffonly=0. Hopefully someone will approve it. There were several other sections that also weren't as specific or useful as they could be, but not actual errors outside of the J section that I noticed. — Peter Cordes, Feb 24 '19 at 05:08

(MIPS Assembler) Can we initialize the Program Counter on our own?

1 Answers1