MIPS - Are characters which are stored as bytes the same as those not stored as bytes?

Question

I'm new to MIPS and I'm just wondering, I store a space character in the following ways:

li $t0, ' '
lb $t1, ' '

la $t2, myArray  # load array
sb $t0, 0($t2)   # myArray[0] = ' '

In this case is $t0 == $t1? And is the sb instruction valid? What I'm a bit confused is whether or not I can use byte and ints(words) interchangeably.

`lb $t1, ' '` is not valid because `lb` requires a memory address. — Jester, Aug 30 '18 at 12:33
which assembler do you use, probably some MARS/SPIM kind of simulators, right? Because that's not how `lb` and `sb` (first version of question) works, or how its syntax is defined, but MARS will often assemble even lines with invalid syntax, as it's parser is quite benevolent. (simple answer is "no", because you completely misunderstood what happens in each case) — Ped7g, Aug 30 '18 at 12:36
yeah I'm using MARS/QTSpim. What I'm trying to do is to store characters into variables and do some comparisons and I'm not sure whether I can compare bytes with words like for example if (char == ' ') I suspect the "li" instruction will work so I can just use its ascii value in comparisons right? — user10284022, Aug 30 '18 at 12:44
And "if (char == ' ')" is not assembly, so I'm not sure which particular instruction construct you would use for it. Keep in mind there're often many possible ways how to write even simple task in assembly, some of them more elegant and performant than others, but many of them still correct, even if a bit convoluted. What you show in your current question is fully correct except the `lb $t1, ' '` (and comment for `la` - it's loading address of array into `t2`, not array itself, but that's probably just unlucky wording on your side). The line with `t1` makes it unclear what to explain you. — Ped7g, Aug 30 '18 at 12:46
although maybe I have some idea, how to fix it into [MCVE] making at least some sense, and which indeed contains some pitfalls for programmer who doesn't pay attention enough... I may try to "answer" the question you barely asked, hm... — Ped7g, Aug 30 '18 at 12:48
Well let's disregard the lb instruction, my intention is to fill a char array but my registers contained variables from li instructions so I wasn't sure if the sb $t0, 0($t2) would work or I'd have to do something else like sw $t0, 0($t2). — user10284022, Aug 30 '18 at 12:51
the `sb` will work in terms of storing the byte into memory, but if your value in register is larger than byte, it will be truncated during storage. I.e. `li $t0,256` `sb $t0,(array)` will set first byte at address `array` to zero, because low 8 bits of value 256 are `0000_0000` in binary (the first set bit is at ninth position, just outside of that range). Bytes and ints are 8 bits vs 32 bits, and you can interchange them only when you are aware what kind of values are processed and if the truncation/extension is working as expected. That said ASCII characters are 7 bit values (fits byte). — Ped7g, Aug 30 '18 at 12:55

Ped7g · Accepted Answer · 2018-08-30T13:49:13.940

The byte vs word is not freely interchangeable, because byte is only 8 bits of information, and word is 32 bits of information (on MIPS platform). So the byte can be set to 256 different possible values (2⁸ combinations of eight 0/1 bit values), and word can be set to 256⁴ different possible values (pattern of 32 bits).

You need four bytes to store the same possible-amount of information like what you can fit into single word (8 bits * 4 = 32 bits).

But depending on the values you are processing, if you can guarantee their ranges, you can predict how the code converting values between byte/half-word/word will behave, whether some values will survive such conversions without any damage, or it needs extra validation/handling. For example if your input values are ASCII characters (from string), then those are only 7 bit (when interpreted as signed integers, only values 0 to +127 are defined in ASCII).

So for example li $t0, ' ' will assemble as li $t0, 32 (because the "space" character is encoded in computer as value 32) and because the li instruction takes as operant signed integer immediate.

Actually the "li" is not real MIPS instruction, but a convenience pseudo-instruction, the assembler will convert it into one/two native instructions to encode/compose the desired immediate value. Try for example li $t0, ... with values +1, -1, +65000, -65000 and watch in debugger how it gets assembled into different native instructions, achieving the desired "load immediate" effect, for example the -65000 value needs at least two native instructions to be composed.

So you are technically loading 32 bit (word) value into $t0 (even if the ' ' is only value 32 which fits easily into byte).

But as you know you did load the ASCII "space" into t0, no matter the t0 is 32 bits "wide", you know it is enough to store only "byte" into memory, if you are for example creating new string in buffer, and you want to put space character into it. So then sb $t0, 0($t2) is correct. Would you have some larger value in t0, the upper 24 bits are ignored, and only low 8 bits of that value are written into memory with sb instruction (effectively "truncating" that value in the memory, it's not possible to read back from memory the full value, only the truncated part).

The conversion in other direction will happen often too in MIPS assembly, because for example lb will read only 8 bits from memory, but it will sign-extend them into full register (32 bits). If you don't pay attention to your values, you may easily trap yourself like for example this:

.data
test_value: .byte 234
.text
    li      $t0, 234
    lb      $t1, test_value
    tne     $t0, $t1     # throw exception if t0 is not equal to t1
    # terminate normally when values are equal
    li      $v0, 10
    syscall

This may look at first read as there's value 234 compared against 234, and thus the program will normally terminate, but if you will try to run it, it will instead end with exception at the tne instruction. Because the lb does sign-extend the value and 234 fits in 8 bits only when you interpret that bit pattern as "unsigned 8 bit integer", if you interpret the same bit pattern as "signed 8 bit integer", it becomes value -22. And -22 does not equal 234.

Would you change the lb instruction to lbu, which loads "unsigned byte", the code will work and exit normally, as the tne will compare 234 vs 234 values, which are equal.

So while programming in assembly, you should be definitely aware of type of data you process, and correctly extend/truncate those values as needed.

(BTW the MARS assembler will warn you about "234" not fitting "signed byte" and possible truncation - but values up to 255 actually do fit 8 bits, just need to be interpreted in "unsigned" way .. values above 255 will get truly truncated, like some bits are completely missing, for example .byte 1025 will store in memory only value 1)

MIPS - Are characters which are stored as bytes the same as those not stored as bytes?

1 Answers1