Groovy's trampoline() makes recursive execution much slower - why?

Question

I'm experimenting with recursion:

def fac
//fac = { int curr, res = 1G -> 1 >= curr ? res : fac( curr - 1, res * curr ) }
fac = { int curr, res = 1G -> 1 >= curr ? res : fac.trampoline( curr - 1, res * curr ) }
fac = fac.trampoline()

def rnd = new Random()

long s = System.currentTimeMillis()

100000.times{ fac rnd.nextInt( 40 ) }

println "done in ${System.currentTimeMillis() - s} ms / ${fac(40)}"

If I use it like this, I'm getting this:

done in 691 ms

If I uncomment line #2 and comment lines #3-4 to remove trampoline() and run it, I'm getting significantly lower numbers:

done in 335 ms

So, with trampoline the recursion works 2 times slower.

What am I missing?

P.S.

If I run the same example in Scala 2.12:

def fac( curr:Int, acc:BigInt = 1 ):BigInt = if( 1 >= curr ) acc else fac( curr - 1, curr * acc )
val s = System.currentTimeMillis
for( ix <- 0 until 100000 ) fac( scala.util.Random.nextInt(40).toInt )

println( s"done in ${System.currentTimeMillis - s} ms" )

it executes a bit faster:

done in 178 ms

UPDATE

Rewriting the closure to a method with the annotation:

@groovy.transform.TailRecursive
def fac( int curr, res = 1G ) { 1 >= curr ? res : fac( curr - 1, res * curr ) }
// the rest

gives

done in 164 ms

and is super-coll. Nevertheless, I still want to know about trampoline() :)

Testing your code gives me roughly the same numbers. A big jump in performance is gained by writing `fac` out as a proper function. Then with `@groovy.transform.TailRecursive` it gets again a bit faster (quite close actually to the scala versions). — cfrick, Jun 13 '19 at 11:17
I guess because trampoline creates more closures to avoid the stackoverflow...but that's just a guess — tim_yates, Jun 13 '19 at 13:28

score 6 · Accepted Answer · edited Jun 20 '20 at 09:12

As stated in the documentation, Closure.trampoline() prevents from overflowing the call stack.

Recursive algorithms are often restricted by a physical limit: the maximum stack height. For example, if you call a method that recursively calls itself too deep, you will eventually receive a StackOverflowException.

An approach that helps in those situations is by using Closure and its trampoline capability.

Closures are wrapped in a TrampolineClosure. Upon calling, a trampolined Closure will call the original Closure waiting for its result. If the outcome of the call is another instance of a TrampolineClosure, created perhaps as a result to a call to the trampoline() method, the Closure will again be invoked. This repetitive invocation of returned trampolined Closures instances will continue until a value other than a trampolined Closure is returned. That value will become the final result of the trampoline. That way, calls are made serially, rather than filling the stack.

^{Source: http://groovy-lang.org/closures.html#_trampoline}

However, using trampoline comes with a cost. Let's take a look at the JVisualVM samples.

Non-trampoline use case

Running an example without trampoline() we get a result in ~441 ms

done in 441 ms / 815915283247897734345611269596115894272000000000

This execution allocates ~2,927,550 objects and consumes around 100 MB of memory.

The CPU has a little to do, and except spending time on main() and run() methods, it spends some cycles on coercing arguments.

The `trampoline()` use case

Introducing the trampoline does change a lot. Firstly, it makes execution time almost two times slower compared to the previous attempt.

done in 856 ms / 815915283247897734345611269596115894272000000000

Secondly, it allocates ~5,931,470 (!!!) objects and consumes ~221 MB of memory. The main difference is that in the previous case a single of $_main_closure1 was used across all executions, and in case of using trampoline - every call to trampoline() method creates:

a new $_main_closure1 object
which gets wrapped with the CurriedClosure<T>
which then gets wrapped with the TrampolineClosure<T>

Only this allocates more than 1,200,000 objects.

If it comes to the CPU, it also has much more things to do. Just look at the numbers:

all calls to TrampolineClosure<T>.<init>() consume 199 ms
using trampoline introduces calls to PojoeMetaMethodSite$PojoCachedMethodSietNoUnwrap.invoke() which in total consume additional 201 ms
all calls to CachedClass$3.initValue() consume in total additional 98.8 ms
all calls to ClosureMetaClass$NormalMethodChooser.chooseMethod() consume in total additional 100 ms

And this is exactly why introducing trampoline in your case makes the code execution much slower.

So why `@TailRecursive` does much better?

In short - @TailRecursive annotation replaces all closures and recursive calls with good old while-loop. The factorial function with @TailRecursive looks something like this at the bytecode level:

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by Fernflower decompiler)
//

package factorial;

import groovy.lang.GroovyObject;
import groovy.lang.MetaClass;
import java.math.BigInteger;
import org.codehaus.groovy.runtime.ScriptBytecodeAdapter;
import org.codehaus.groovy.runtime.dgmimpl.NumberNumberMultiply;
import org.codehaus.groovy.transform.tailrec.GotoRecurHereException;

public class Groovy implements GroovyObject {
    public Groovy() {
        MetaClass var1 = this.$getStaticMetaClass();
        this.metaClass = var1;
    }

    public static BigInteger factorial(int number, BigInteger acc) {
        BigInteger _acc_ = acc;
        int _number_ = number;

        try {
            while(true) {
                try {
                    while(_number_ != 1) {
                        int __number__ = _number_;
                        int var7 = _number_ - 1;
                        _number_ = var7;
                        Number var8 = NumberNumberMultiply.multiply(__number__, _acc_);
                        _acc_ = (BigInteger)ScriptBytecodeAdapter.castToType(var8, BigInteger.class);
                    }

                    BigInteger var4 = _acc_;
                    return var4;
                } catch (GotoRecurHereException var13) {
                    ;
                }
            }
        } finally {
            ;
        }
    }

    public static BigInteger factorial(int number) {
        return factorial(number, (BigInteger)ScriptBytecodeAdapter.castToType(1, BigInteger.class));
    }
}

I have documented this use case on my blog some time ago. You can read the blog post if you want to get more information:

https://e.printstacktrace.blog/tail-recursive-methods-in-groovy/

Groovy's trampoline() makes recursive execution much slower - why?

1 Answers1

Non-trampoline use case

The trampoline() use case

So why @TailRecursive does much better?

The `trampoline()` use case

So why `@TailRecursive` does much better?