9

In the RISC MIPS instruction set, we have load byte (lbu), load half word (lhu) and load word (lw) instructions. It appears to me that everything lbu and lhu can do can be achieved with lw.

So why did the MIPS designers introduce lbu and lhu? In what circumstances (preferably non-obscure ones) might they be useful? Perhaps lw takes longer than lbu to execute, even though both are single instructions?

flow2k
  • 3,999
  • 40
  • 55
  • 1
    The original Alpha didn't have sub word load/store instructions. You can get pretty far without them, but the Alpha eventually had to extend the instruction set with these instructions for multiprocessor systems. – EOF Nov 26 '16 at 16:10

1 Answers1

8

lw requires the address that you load from to be word-aligned (i.e. the address must be a multiple of 4).

So let's say that you have you have this array located at address 0x1000:

array: .byte 0,1,2,3

And you want to load the second byte (1), which is located at address 0x1001, which isn't word-aligned. That clearly won't work, unless you did an lw from address 0x1000 and then performed some shifting and ANDing to get the byte you wanted, which would be a real hassle as a programmer.

Or let's say you wanted to load the 0, which is located at a word-aligned address, and compare it against some value. So you do lw from address 0x1000, but now your target register will contain either 0x00010203 or 0x03020100 (depending on the endianness) rather than just 0. So before performing the comparison you'll have to do a bitwise AND to extract the byte you wanted.

As I'm sure you can see it would be very inconvenient to have to do these extra steps whenever you want to process individual bytes of data - which in most programs is a pretty common operation.

Michael
  • 57,169
  • 9
  • 80
  • 125
  • Michael - thank you. With regards to the doubt I had about the execution time, I suppose `lw` and `lbu` would take the same amount time, right? That is, if we needed to wait for memory, the required wait time would not differ between the two, correct? – flow2k Nov 27 '16 at 04:00
  • 1
    @flow2k: zero-extending the byte into a 32-bit register is essentially free in hardware, and hopefully a good implementation can efficiently choose the right word to fetch from cache. I'd expect LW and LBU to perform the same, but it's possible that LBU has slightly higher latency on some implementations. If LHU works on unaligned halfwords, it could need to load one byte from each of two neighbouring cache lines. It might be slower all the time, or only slower on cache-line splits. (Totally making stuff up here based on my knowledge of x86 and CPU architecture; IDK what MIPS HW is like). – Peter Cordes Nov 27 '16 at 06:14