1

I need help understanding a basic implementation of the hello world program in assembly language using the SPARC instruction set. I have the fully functioning code below I just need help understanding how it works. I could not figure out how to post my code with line numbers so I do apologize for any confusion regarding referencing specific lines of code. Any help is very extremely appreciated.

First, the line with the cout .equ ... console data port and the line with the costat .equ ... console status port. Not sure what this means. It looks like they are assigning the terms "cout" and "costat" to memory addresses 0x0 and 0x4 respectively, but what is a console data port and a console status port, what is the significance of this line of code?

Next, the line with the "sethi" command really confuses me. I know it's related to setting the most significant 22 bits to something but I don't understand what the significance is in this program, what are we accomplishing with this command I really don't understand it at all.

Next, the loop subroutine. It looks like we're loading the contents of register 2 plus the hello world string (defined by a sequence of ASCII characters) and putting it in register 3. I'm not familiar with the HEX notation in the form of 0xnn where n refer to integers. Is this an abbreviated form of standard hexadecimal notation?

next line of the loop it looks like we are adding the contents of register 3 plus zero, and storing the result in register 3. What is the significance of this, why add zero?

last line of the loop is 'be End'. I believe this means "branch if equal and branch to the subroutine called "end" but branch if equal to what? The notes say branch if null, but again, if what is null? I'm not sure what this is referring to.

Next, we have the 'Wait' subroutine which begins with a command to load an unsigned byte at the address of register 4 plus costat (our console status port) and store the result in register 1. Again, what does this mean, what is this instruction doing in the program? By the way when a term is in braces like '[ ]' that is referring to the contents of the memory address right? Or is it referring to the memory address itself. I am constantly confused by this.

Next line we are using "and" with register 1 and another ASCII character and putting the result back in register 1. Maybe this is some sort of punctuation, perhaps the comma in "hello, world!" Again what is this command doing?

Next line is "be wait" which looks like "branch if equal to subroutine 'wait'" Again, branch if equal to what? also why call subroutine 'wait' from within the subroutine, is it a recursive call? What is going on here?

Next line I think is taking the byte in register 3 and storing it in the contents of register 4 plus cout (our console data port). This must have to do with outputting the characters to the console but how is this working, what is taking place in this line of code, why add register 4 to cout?

Next line seems to be incrementing register 2 to the next machine word, possibly relating to the next character in the hello world string.

Lastly, a "branch always" call back to the loop. What is the significance of this, please explain if possible. Thank you

to store a byte
! Prints "Hello, world! \n" in the msgarea. ! SRCTools version: vph 6/29/00, updated rez 4/16/02 ! ARCTools version: mww converted 6/17/05

.begin
BASE    .equ 0x3fffc0       !Starting point of the memory mapped region
COUT    .equ 0x0            !0xffff0000 Console Data Port
COSTAT  .equ 0x4            !0xffff0004 Console Status Port.

         .org 2048
         add %r0, %r0, %r2
         add %r0, %r0, %r4
         sethi BASE, %r4

Loop:    ld [%r2 + String], %r3 !Load next char into r3
         addcc %r3,%r0,%r3
         be End                 ! stop if null.

Wait:    ldub [%r4+COSTAT], %r1
         andcc %r1, 0x80, %r1
         be Wait
         stb %r3, [%r4+COUT]    !Print to console
         add %r2, 4, %r2        !increment String offset (r2)
         ba Loop
End:     halt

        .org 3000

! The "Hellow, world!" string

String: 0x48, 0x65, 0x6c, 0x6c, 0x6f
0x2c, 0x20, 0x77, 0x6f, 0x72 
0x6c, 0x64, 0x21, 0x0a, 0

.end
Trixie the Cat
  • 317
  • 3
  • 18

2 Answers2

2

What is a console data port and a console status port

That depends on your hardware. Apparently your console is memory mapped and uses those addresses for communication.

Next, the line with the "sethi" command really confuses me.

It is used to load r4 with 0xffff0000 which is the base address for the memory mapped range. As you said, sethi only uses 22 bits so you need to shift that address right by 10 bits, which then gives 3fffc0 (the value of BASE).

I'm not familiar with the HEX notation in the form of 0xnn where n refer to integers. Is this an abbreviated form of standard hexadecimal notation?

You know leading zeroes can be ignored, right? If I give you $00000100 you won't be a millionaire.

What is the significance of this, why add zero?

The important part is the cc. That addition is used to set flags so you can check for zero using the following be.

Next line is "be wait" which looks like "branch if equal to subroutine 'wait'

Wait is not a subroutine it's just a label. be is just looking at the zero flag, set earlier, see previous point and an instruction set reference.

why add register 4 to cout

Because cout is just an offset from the start of the memory mapped region which is pointed to by r4. You really want to write to 0xffff0000 and that is calculated as 0xffff0000+0. Of course knowing COUT is zero, you could omit the addition.

Lastly, a "branch always" call back to the loop. What is the significance of this, please explain if possible.

You should have understood this part yourself. Obviously it's going back to print the next character.

Jester
  • 56,577
  • 4
  • 81
  • 125
1

but what is a console data port and a console status port, what is the significance of this line of code?

It would help you to understand how an old fashioned serial port works. Usually there is a hardware register that you write to in order to output a character, and there is a status register that you poll in order to find out when it is safe to write another character of output. The reason for this is that it takes time for the byte of data to be sent out on the serial line bit by bit, and if you write a character too quickly after the previous character, the output will be garbled.

I'm not familiar with the HEX notation in the form of 0xnn where n refer to integers. Is this an abbreviated form of standard hexadecimal notation?

It's not an abbreviated form of hex notation, it is simply hex notation. All hex numbers are of the form 0xnn, and the 'nn' may be any number of hex digits. So 0x48 means hex 48, which in decimal would be 4*16+8 = 72, and if you look at any chart of ascii characters, you'll see this is the letter 'H'.

next line of the loop it looks like we are adding the contents of register 3 plus zero, and storing the result in register 3. What is the significance of this, why add zero?

The addcc instruction performs an add operation, but also combines it with a test operation. So the machine's condition codes are set according to the result of the add. In this case, we don't really care about adding anything, we just want to perform a test, so zero is added.

last line of the loop is 'be End'. I believe this means "branch if equal and branch to the subroutine called "end" but branch if equal to what? The notes say branch if null, but again, if what is null? I'm not sure what this is referring to.

It refers to the result of the last test operation. Since assembly language does not have a "if-then" instruction, this is how it's done. You test something, and then branch based on the result of the test. So the way to read these two lines of code would be "if r3 equals zero, then branch to End."

Next, we have the 'Wait' subroutine which begins with a command to load an unsigned byte at the address of register 4 plus costat (our console status port) and store the result in register 1. Again, what does this mean, what is this instruction doing in the program? By the way when a term is in braces like '[ ]' that is referring to the contents of the memory address right? Or is it referring to the memory address itself. I am constantly confused by this.

The square brackets always refer to something stored in memory, and the expression within the square brackets give the memory address. So this instruction means "add r4 and COSTAT and use this result as a memory address, then load the (byte) contents of that memory address into r1."

Next line we are using "and" with register 1 and another ASCII character and putting the result back in register 1. Maybe this is some sort of punctuation, perhaps the comma in "hello, world!" Again what is this command doing?

The 'and' instruction performs a bitwise "and" operation between two numbers, in this case the contents of r1 and the hex value 0x80. The result is stored in r1, and the 'cc' specifies that the result (r1) is to be tested and condition codes set. The 0x80 is not an ascii character, but rather a single bit that represents some status flag in the status byte. After you write a character to the serial output register, this status bit will set to zero until it is safe to write another character.

Next line is "be wait" which looks like "branch if equal to subroutine 'wait'" Again, branch if equal to what? also why call subroutine 'wait' from within the subroutine, is it a recursive call? What is going on here?

The "be wait" uses the result of the previous test to determine whether or not to take the branch. So you can read these two instructions as "if r1 & 0x80 then goto wait". These instructions implement a loop which keeps testing this bit forever until the time when it is safe to write another character. The bit is changed by the hardware, and as soon as that happens, the loop will terminate and control will go on to the next instruction.

Next line I think is taking the byte in register 3 and storing it in the contents of register 4 plus cout (our console data port). This must have to do with outputting the characters to the console but how is this working, what is taking place in this line of code, why add register 4 to cout?

r4 represents the "base address" where the serial port registers live, and cout and costat are offsets into that region. So here you are just writing a character to the output register.

Next line seems to be incrementing register 2 to the next machine word, possibly relating to the next character in the hello world string.

That is clearly the intent, however it makes no sense to add 4 to r2 as that will skip over 3 characters. This looks like a bug, and should be just adding 1.

Rob Gardner
  • 201
  • 1
  • 4
  • r2 += 4 is not a bug, note that it uses `ld` to load a whole word from the string (instead of `ldub`). (And SPARC doesn't allow unaligned loads). Rather, the string is stored one character per word! Presumably this is why it's written out in hex constants, instead of `.asciz "Hello World\n". It's weird that there's no `.word` directive or anything; apparently this assembler just assembles hex constants in the input into words in the output if they appear on a line without an instruction / directive / pseudo-instruction. – Peter Cordes Mar 20 '18 at 02:29
  • This isn't the only dumb thing, either: `sethi` clears the low bits, so zeroing `%r4` before `sethi` is also useless. The `addcc %r3,%r0,%r3` to set flags based on `%r3` should use `%r0` as a destination so it's not part of the dependency chain for later instructions that read `%r3`. The loop could also be structured with the conditional branch at the bottom, instead of a `ja`. (Although with a delay in the loop, only code-size matters, or maybe the performance of leaving the loop.) – Peter Cordes Mar 20 '18 at 02:32
  • You're right about the r2 += 4 thing, I did not see that the string was just a sequence of words. The whole thing is an awful code example though. I guess this code is meant to run on some alien sparc processor that doesn't use delay slots: the _stb_ is in the delay slot of the _be_, and the _halt_ is in the delay slot of the _ba_ instructions, which would render the entire thing wildly non-functional. – Rob Gardner Mar 20 '18 at 03:49
  • Hah, yeah I *thought* SPARC had branch delay slots, but I thought I must have mis-remembered based on this code :P. Maybe that assembler fills branch-delay slots with `nop` automatically? Or maybe it's for a simulator that (like the MARS and SPIM MIPS simulators) defaults to *not* having branch-delay slots. – Peter Cordes Mar 20 '18 at 03:56