In the normal MIPS calling convention, args after the 4th will already be stored on the call stack, placed there by your caller.
The standard calling convention leaves padding before stack args, where you could store the register args to create a contiguous array of all the args. This PDF has a diagram, and see also MIPS function call with more than four arguments
This is normally called "shadow space" in x86-64 Windows. But since MIPS jal
doesn't store anything to memory (unlike x86 which pushes a return address on the stack, MIPS puts the return address in $lr
), even if the calling convention didn't include this shadow space a function could still adjust SP first and then store register args contiguous with stack args. So the only benefit I can see is giving tiny functions extra scratch space without having to adjust the stack pointer. This is less useful than on x86-64, where it isn't easily possible to create an array of args without it.
Or you could peel the first 3 sum iterations that handle $a1
.. $a3
(again assuming the standard MIPS calling convention with the first 4 args in registers, $a0
being int n
).
Then loop over stack args if you haven't got to n
yet.
You could write a C function and look at optimized compiler output, like this
#include <stdarg.h>
int sumargs(int n, ...) {
va_list args;
va_start(args, n);
int sum=0;
for (int i=0 ; i<n ; i++){
sum += va_arg(args, int);
}
va_end(args);
return sum;
}
va_start
and va_arg
aren't real functions; they'll expand to some inline code. va_start(args,n)
dumps the arg-passing registers after n
into the shadow space (contiguous with stack args, if any).
MIPS gcc unfortunately doesn't support the -mregnames
option to use names like $a0 and $t0, but google found a nice table of register name<->number
MIPS asm output from the Godbolt compiler explorer
# gcc5.4 -O3 -fno-delayed-branch
sumargs(int, ...):
# on entry: SP points 16 bytes below the first non-register arg, if there is one.
addiu $sp,$sp,-16 # reserve another 16 bytes
addiu $3,$sp,20 # create a pointer to the base of this array
sw $5,20($sp) # dump $a1..$a3 into the shadow space
sw $6,24($sp)
sw $7,28($sp)
sw $3,8($sp) # spill the pointer into scratch space for some reason?
blez $4,$L4 # check if the loop should run 0 times.
nop # branch-delay slot. (MARS can simulate a MIPS without delayed branches, so I told gcc to fill the slots with nops)
move $5,$0 # i=0
move $2,$0 # $v0 = sum = 0
$L3: # do {
lw $6,0($3)
addiu $5,$5,1 # i++
addu $2,$2,$6 # sum += *arg_pointer
addiu $3,$3,4 # arg_pointer++ (4 bytes)
bne $4,$5,$L3 # } while(i != n)
nop # fill the branch-delay slot
$L2:
addiu $sp,$sp,16
j $31 # return (with sum in $v0)
nop
$L4:
move $2,$0 # return 0
b $L2
nop
Looping on do {}while(--n)
would have been more efficient. It's a missed optimization that gcc doesn't do this when compiling the for loop.