How to implement tail calls in a custom VM

Question

How can I implement tail calls in a custom virtual machine?

I know that I need to pop off the original function's local stack, then it's arguments, then push on the new arguments. But, if I pop off the function's local stack, how am I supposed to push on the new arguments? They've just been popped off the stack.

score 4 · Accepted Answer · edited May 13 '10 at 16:56

I take it for granted that we're discussing a traditional "stack-based" virtual machine here.

You pop off the current function's local stack preserving the still-relevant parts in non-stack "registers" (where the "relevant parts" are, clearly, the argument for the forthcoming recursive tail call), then (once all of the function's local stack and arguments are cleaned up) you push the arguments for the recursive call. E.g., suppose the function you're optimizing is something like:

def aux(n, tot):
  if n <= 1: return tot
  return aux(n-1, tot * n)

which without optimization might produce byte-code symbolically like:

AUX:   LOAD_VAR   N
       LOAD_CONST 1
       COMPARE
       JUMPIF_GT  LAB
       LOAD_VAR   TOT
       RETURN_VAL
LAB:   LOAD_VAR   N
       LOAD_CONST 1
       SUBTRACT
       LOAD_VAR   TOT
       LOAD_VAR   N
       MULTIPLY
       CALL_FUN2  AUX
       RETURN_VAL

the CALL_FUN2 means "call a function with two arguments". With the optimization, it could become sometime like:

   POP_KEEP     2
   POP_DISCARD  2
   PUSH_KEPT    2
   JUMP         AUX

Of course I'm making up my symbolic bytecodes as I go along, but I hope the intent is clear: POP_DISCARD n is the normal pop that just discards the top n entries from the stack, but POP_KEEP n is a variant that keeps them "somewhere" (e.g. in an auxiliary stack not directly accessible to the application but only to the VM's own machinery -- storage with such a character is sometimes called "a register" when discussing VM implementation) and a matching PUSH_KEPT n which empties the "registers" back into the VM's normal stack.

The trouble with that is that the types (and their sizes) in question are totally unknown. I could transfer to an internal stack, but that would limit the total argument size. I guess if I did something like, 200 bytes, then that's more than any sane person would want to transfer anyway. Cheers. — Puppy, May 13 '10 at 15:49
@DeadMG, to deal with arbitrary types unknown at compile time the usual approach is to pass around _pointers_ (e.g., in the CPython VM, pointers to `PyObject`) -- or equivalently, references, if your implementation language does not use pointers. Then, the size of what goes on and off the stack is perfectly well known -- `sizeof(whatever*)`, e.g., 4 bytes per object in a 32-bit machine. — Alex Martelli, May 13 '10 at 16:23
If I only store pointers on the stack, where the hell am I gonna put what they point to? — Puppy, May 13 '10 at 21:16
@DeadMG, they will typically point to space allocated by `malloc` (or equivalent ways in your implementation language, which you haven't yet mentioned): that memory space is usually called the **heap**. — Alex Martelli, May 13 '10 at 21:21

Mooing Duck · Answer 2 · 2013-12-20T17:38:30.753

I think you're looking at this the wrong way. Instead of popping the old variables off the stack and then pushing the new ones, simply reassign the ones already there (carefully). This is roughly the same optimization that would happen if you rewrote the code to be the equivalent iterative algorithm.

For this code:

 int fact(int x, int total=1) {
     if (x == 1)
         return total;
     return fact(x-1, total*x);
 }

would be

 fact:
   jmpne x, 1, fact_cont  # if x!=1 jump to multiply
   retrn total            # return total
 fact_cont:               # update variables for "recursion
   mul total,x,total      # total=total*x
   sub x,1,x              # x=x-1
   jmp fact               #"recurse"

There's no need to pop or push anything on the stack, merely reassign.
Clearly, this can be further optimized, by putting the exit condition second, allowing us to skip a jump, resulting in fewer operations.

 fact_cont:               # update variables for "recursion
   mul total,x,total      # total=total*x
   sub x,1,x              # x=x-1
 fact:
   jmpne x, 1, fact_cont  # if x!=1 jump to multiply
   retrn total            # return total

Looking again, this "assembly" better reflects this C++, which clearly has avoided the recursion calls

int fact(int x, int total=1)
    for( ; x>1; --x)
        total*=x;
    return total;
}

How to implement tail calls in a custom VM

2 Answers2