I think you're looking at this the wrong way. Instead of popping the old variables off the stack and then pushing the new ones, simply reassign the ones already there (carefully). This is roughly the same optimization that would happen if you rewrote the code to be the equivalent iterative algorithm.
For this code:
int fact(int x, int total=1) {
if (x == 1)
return total;
return fact(x-1, total*x);
}
would be
fact:
jmpne x, 1, fact_cont # if x!=1 jump to multiply
retrn total # return total
fact_cont: # update variables for "recursion
mul total,x,total # total=total*x
sub x,1,x # x=x-1
jmp fact #"recurse"
There's no need to pop or push anything on the stack, merely reassign.
Clearly, this can be further optimized, by putting the exit condition second, allowing us to skip a jump, resulting in fewer operations.
fact_cont: # update variables for "recursion
mul total,x,total # total=total*x
sub x,1,x # x=x-1
fact:
jmpne x, 1, fact_cont # if x!=1 jump to multiply
retrn total # return total
Looking again, this "assembly" better reflects this C++, which clearly has avoided the recursion calls
int fact(int x, int total=1)
for( ; x>1; --x)
total*=x;
return total;
}