How to simplify MIPS ADD 1 / SLT / BNE into fewer Instructions?

Question

How can these MIPS instructions be reduced to fewer instructions?

addi  $8, $3, 1
slt   $9, $2, $8
bne   $9, $0, End

Probably `<=` instead of `++` and `<`. There is a pseudo-instruction `blt` or `ble` but that doesn't really count as fewer instructions. There's also `bgez` as a real HW instruction, maybe `sub` / `bgez` could also work like `sle` / `bne`. Or maybe not, and `sle` is the more obvious one for sure. — Peter Cordes, Oct 01 '19 at 03:31

Erik Eidt · Answer 1 · 2019-10-01T04:23:13.410

How to Simplify these MIPS Instructions?

slt $9, $3, $2
beq $9, $0, End

Here's one way to reason it out. You want to do:

if ( $2 < $3+1 ) goto End;

We convert that to remove the addition:

if ( $2 <= $3 ) goto End;

But we don't have <= on MIPS, so, we reverse the condition and negate it. This double negation cancels out so still represents the same logic. This removes the equality component of the comparison:

if ( ! ( $2 > $3 ) ) goto End;

now, we swap the operands... since MIPS also doesn't have > :

if ( ! ( $3 < $2 ) ) goto End;

(NB: This swapping of the operands does not negate the condition: for this kind of swap we keep the equality component the same (here, absent) when flipping the operator; whereas in negation as in the preceding step, we flip the operator and also change its equality component.)

The good news is that we can perform this in only two instructions, because the negation can be folded into the branch instruction by using beq (branch on false) instead of bne (branch on true).

In fact, if you use the ble pseudo instruction, you'll get the same above two-instruction sequence.

ble $2, $3, End

As an aside, sle is a poor option, depending on your criteria.

MIPS doesn't have sle as an instruction, it is a pseudo instruction that does:

sle $9, $2, $3 generates:

slt $9, $3, $2    # generate the opposite condition
ori $1, $0, 0x1   # generate the constant 1
sub $9, $1, $9    # generate 1 - "the opposite condition"

As you can see it puts several additional instructions toward making the exact 1 vs. 0 answer we ought to get for sle, to which you'll still have to add a branch instruction, so that makes 4 instructions! (And we could have branched on false after the initial instruction of that expansion.) There also is no "reverse subtract immediate", so an R-Type subtract is used with a separately generated constant.

Does this transformation work even with the possibility signed wraparound? (i.e. would it be safe for `addiu` as well as `addi`?) I think not: `<= INT_MAX` is a lot different from `< INT_MIN`. So maybe worth mentioning that `addi` faults on that wraparound and we're *not* replicating the behaviour for that input (because this version can't raise an integer overflow exception). — Peter Cordes, Oct 01 '19 at 04:18
I checked, compilers don't do this optimization because it's not safe unless you rule out `y==INT_MAX`, or when signed overflow is UB. — Peter Cordes, Oct 01 '19 at 13:22

Peter Cordes · Answer 2 · 2019-10-02T05:06:32.133

First of all, you can use a pseudoinstruction like blt $9, $2, End to write the SLT and BNEZ on one source line. But I doubt that's what you mean; the rest of the answer is only concerned with reducing the number of hardware MIPS instructions.

Unless otherwise stated, an optimization is only safe / allowed if it gives the same behaviour for every possible input. For optimizing asm -> asm, you need to know which parts of the behaviour are desired and which are just implementation details that don't need to be preserved.

In this case, I think we're meant to assume that creating $8 = $3+1 is not part of the visible / desired behaviour. I think the point is just to branch or fall-through using fewer than 3 instructions, regardless of what temporary we create or not. (e.g. subu $8, $2, $3 / bgez might be an option.)

You can only optimize/simplify this if you change behaviour for the $3 = INT_MAX corner case. (Or if you have some guarantees about the possible ranges of your inputs).

In the original, MIPS addi traps on signed overflow. (That's why it's normally never used; addiu wraps like you'd expect and is otherwise identical.) Raising this exception for $3 = 0x7FFFFFFF and not for any other case pretty much requires that you use addi with an immediate 1. That's part of the behaviour of the original sequence and nothing in the question tells us we're allowed to relax that.

If you had used addiu to implement jump if ($2 < $3+1) with 2's complement wraparound, then INT_MAX is still a special case. If x <= 0x7FFFFFFF is true for all x, but x < -0x8000000 is false for all x (32-bit 2's complement, which is what MIPS slt implements). x < y+1 is equivalent to x <= y, but only if y+1 doesn't wrap.

e.g. in C, signed integer overflow is UB, but unsigned wraps. And the rules for converting unsigned to signed are well-defined as modulo-reducing in a way that means 2's complement machines (like MIPS) can just take the unsigned bit-pattern as signed, i.e. the cast is free and just a type-pun.

// C equivalent to asm using `addiu`.  C for MIPS uses 32-bit unsigned and 2's complement int
void foo_wrapping(int x, int y)
{
    unsigned tmp = y;
    tmp++;               // wraps without UB
    int wrapped_yp1 = tmp;
    // y+1 with 2's complement wraparound, without using -fwrapv

    if (! (x < wrapped_yp1))
        sink = 0;
}

GCC5.4 for MIPS compiles it as follows (Godbolt). Register numbers are different from your question, but the pattern is identical. (add result on the right-hand side of SLT, then a BNEZ on that SLT result.)

(add vs. addu sort of happens to match up with C signed overflow being UB, unsigned just wrapping, but note that C compilers use addu/addiu even for signed addition because one thing UB is allowed to do is wrap. C doesn't require it to be detected and crash, the whole point is that it allows the optimizer to assume it doesn't happen at all.)

# gcc5.4 -O3
foo_wrapping(int, int):
        addiu   $5,$5,1
        slt     $4,$4,$5
        bne     $4,$0,$L7
     ... a store that it conditionally jumps over, then jr $ra

clang9.0 also emits the same code. Compilers not finding optimizations isn't proof that none are possible, but it's a nice check that my reasoning above is correct.

Using plain y+1 and compiling with gcc or clang -fwrapv (to make signed overflow well-defined as 2's complement wraparound) also gives the same result. And then using if (y == 0x7FFFFFFF) __builtin_unreachable(); works correctly to promise the compiler that y!=INT_MAX, letting the compiler optimize. Compilers miss in the version using unsigned.

If we exclude INT_MAX as a possible input (i.e. we don't care about the case where the original code trapped), then we can represent the operation in C as x < y+1 for signed int vars. In C, signed-overflow is Undefined Behaviour so optimizing compilers are allowed to assume that y+1 doesn't overflow, and thus that y!=INT_MAX.

// This C is not equivalent to your asm.
// It doesn't trap on y==0x7FFFFFFF, and it doesn't necessarily wrap like addiu
int sink;
void foo_nooverflow(int x, int y) {
    if (x < y+1) {
        sink = 0;      // a store can't be done branchlessly
    }
}

## gcc5.4 -O3
foo_nooverflow(int, int):
        slt     $4,$5,$4
        beq     $4,$0,$L4

So gcc/clang transform the condition as Erik Eidt's answer suggests, comparing in the other order and swapping bne for beq as a way to implement jump if ($2 <= $3) because MIPS only has a limited selection of sXX instructions.

A version using unsigned x, y also has to addiu separately from the compare, and then uses a different compare instruction: sltu. slt is a signed (2's complement) comparison.

The distinction is important when either input has its high bit set.

How to simplify MIPS ADD 1 / SLT / BNE into fewer Instructions?

2 Answers2