0

I am going to rewrite a simple mips code that I need to make it become the fastest version. I don’t know how can I optimise this code and I want to get some help

.data
a:.byte0
b:.byte0
c:.byte0
.text

abc:
lb $t0, 0($s0)
addi $t0, $t0, 1
sb $t0, 0($s0)
lb $t0, 0($s1)
addi $t0, $t0, 2
sb $t0, 0($s1)
lb $t0, 0($s2)
addi $t0, $t0, 3
sb $t0, 0($s2)
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 2
    Does this code work? What does it do? What's supposed to be in `$s0` & `$s1`? How are we supposed to help without details or context? Broadly speaking, the approach to optimization is to use memory less and registers more, and when we do use memory, schedule loads an instruction or two before they are needed. – Erik Eidt Mar 30 '21 at 09:58
  • Base address of a,b,c are in register $s0,$s1,$s2. This code is work. Don’t need to care about other factor, I just need to rewrite it and make it run in the fastest way. – Jason Chan Mar 30 '21 at 10:03
  • 1
    Why not use `$s0` directly for `a` (instead of using `$s0` as a pointer to `a`)? That would be fastest. – Erik Eidt Mar 30 '21 at 10:37
  • 1
    Faster on what MIPS microarchitecture? Classic R2000? R10k? A MIPS32r6 core? Can you assume that `a:` is word-aligned and that there's padding to allow a word load + word store? Does your MIPS have MSA ([MIPS SIMD Architecture](https://www.mips.com/products/architectures/ase/simd/)) that would allow one instruction to do four byte adds in parallel, without carry propagation between bytes? Otherwise you'd need SWAR techniques to squash the carry-out, but probably still worth it if it allows a single word load + word store. – Peter Cordes Mar 30 '21 at 12:40
  • Also, I assume `.byte0` is supposed to be `.byte 0`, i.e. one byte with value `0`. – Peter Cordes Mar 30 '21 at 12:43
  • The optimization opportunities here are limited due to the possibility of aliasing (if s0 = s1, for example) and because it does so few things with no loops. The real optimization is not in this code but in its interaction with the code that calls this code. If there are correlations between the calls (e.g., s0 is walking through an array), then the code could reorder the memory operations to batch the accesses or use SIMD to operate on multiple bytes at once. – Raymond Chen Mar 30 '21 at 13:11

0 Answers0