Instantiation of recursive generic types slows down exponentially the deeper they are nested. Why?

Question

Note: I may have chosen the wrong word in the title; perhaps I'm really talking about polynomial growth here. See the benchmark result at the end of this question.

Let's start with these three recursive generic interfaces^† that represent immutable stacks:

interface IStack<T>
{
    INonEmptyStack<T, IStack<T>> Push(T x);
}

interface IEmptyStack<T> : IStack<T>
{
    new INonEmptyStack<T, IEmptyStack<T>> Push(T x);
}

interface INonEmptyStack<T, out TStackBeneath> : IStack<T>
    where TStackBeneath : IStack<T>
{
    T Top { get; }
    TStackBeneath Pop();
    new INonEmptyStack<T, INonEmptyStack<T, TStackBeneath>> Push(T x);
}

I've created straightforward implementations EmptyStack<T>, NonEmptyStack<T,TStackBeneath>.

Update #1: See the code below.

I've noticed the following things about their runtime performance:

Pushing 1,000 items onto an EmptyStack<int> for the first time takes more than 7 seconds.
Pushing 1,000 items onto an EmptyStack<int> takes virtually no time at all afterwards.
Performance gets exponentially worse the more items I push onto the stack.

Update #2:

I've finally performed a more precise measurement. See the benchmark code and results below.

I've only discovered during these tests that .NET 3.5 doesn't seem to allow generic types with a recursion depth ≥ 100. .NET 4 doesn't seem to have this restriction.

The first two facts make me suspect that the slow performance is not due to my implementation, but rather to the type system: .NET has to instantiate 1,000 distinct closed generic types, ie.:

EmptyStack<int>
NonEmptyStack<int, EmptyStack<int>>
NonEmptyStack<int, NonEmptyStack<int, EmptyStack<int>>>
NonEmptyStack<int, NonEmptyStack<int, NonEmptyStack<int, EmptyStack<int>>>>
etc.

Questions:

Is my above assessment correct?
If so, why does instantiation of generic types such as T<U>, T<T<U>>, T<T<T<U>>>, and so on get exponentially slower the deeper they are nested?
Are CLR implementations other than .NET (Mono, Silverlight, .NET Compact etc.) known to exhibit the same characteristics?

^†) Off-topic footnote: These types are quite interesting btw. because they allow the compiler to catch certain errors such as:
stack.Push(item).Pop().Pop();
//                    ^^^^^^
// causes compile-time error if 'stack' is not known to be non-empty.
Or you can express requirements for certain stack operations:
TStackBeneath PopTwoItems<T, TStackBeneath>
              (INonEmptyStack<T, INonEmptyStack<T, TStackBeneath> stack)

Update #1: Implementation of the above interfaces

internal class EmptyStack<T> : IEmptyStack<T>
{
    public INonEmptyStack<T, IEmptyStack<T>> Push(T x)
    {
        return new NonEmptyStack<T, IEmptyStack<T>>(x, this);
    }

    INonEmptyStack<T, IStack<T>> IStack<T>.Push(T x)
    {
        return Push(x);
    }
}
// ^ this could be made into a singleton per type T

internal class NonEmptyStack<T, TStackBeneath> : INonEmptyStack<T, TStackBeneath>
    where TStackBeneath : IStack<T>
{
    private readonly T top;
    private readonly TStackBeneath stackBeneathTop;

    public NonEmptyStack(T top, TStackBeneath stackBeneathTop)
    {
        this.top = top;
        this.stackBeneathTop = stackBeneathTop;
    }

    public T Top { get { return top; } }

    public TStackBeneath Pop()
    {
        return stackBeneathTop;
    }

    public INonEmptyStack<T, INonEmptyStack<T, TStackBeneath>> Push(T x)
    {
        return new NonEmptyStack<T, INonEmptyStack<T, TStackBeneath>>(x, this);
    }

    INonEmptyStack<T, IStack<T>> IStack<T>.Push(T x)
    {
        return Push(x);
    }
}

Update #2: Benchmark code and results

I used the following code to measure recursive generic type instantiation times for .NET 4 on a Windows 7 SP 1 x64 (Intel U4100 @ 1.3 GHz, 4 GB RAM) notebook. This is a different, faster machine than the one I originally used, so the results do not match with the statements above.

Console.WriteLine("N, t [ms]");
int outerN = 0;
while (true)
{
    outerN++;
    var appDomain = AppDomain.CreateDomain(outerN.ToString());
    appDomain.SetData("n", outerN);
    appDomain.DoCallBack(delegate {
        int n = (int)AppDomain.CurrentDomain.GetData("n");
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        IStack<int> s = new EmptyStack<int>();
        for (int i = 0; i < n; ++i)
        {
            s = s.Push(i);  // <-- this "creates" a new type
        }
        stopwatch.Stop();
        long ms = stopwatch.ElapsedMilliseconds;
        Console.WriteLine("{0}, {1}", n, ms);
    });
    AppDomain.Unload(appDomain);
}

(Each measurement is taken in a separate app domain because this ensures that all runtime types will have to be re-created in each loop iteration.)

Here's a X-Y plot of the output:

Line chart showing a measurement for recursive generic type instantiation times

Horizontal axis: N denotes the depth of type recursion, i.e.:
- N = 1 indicates a NonEmptyStack<EmptyStack<T>>
- N = 2 indicates a NonEmptyStack<NonEmptyStack<EmptyStack<T>>>
- etc.
Vertical axis: t is the time (in milliseconds) required to push N integers onto a stack. (The time needed to create runtime types, if that actually happens, is included in this measurement.)

It would really help if you could provide the implementations and your benchmarking code... oh, and an idea of whether you were *really* going to try to use code like this, which seems rather tortuous to me. — Jon Skeet, Aug 14 '11 at 21:40
As far as I know .NET creates single closed generic class per unique generic parameters set and then reuse it, so for creating 1000 instances of EmptyStack one type should be created, why you mentioned 1000? — sll, Aug 14 '11 at 21:44
@Jon, **1.** I've appended the implementation at the end of the question. **2.** It was more an experiment than production code for regular use, but using these classes is actually not a hassle at all thanks to type inference. — stakx - no longer contributing, Aug 14 '11 at 21:59
@sllev, it doesn't create 1,000 instances of `EmptyStack`, there are really 1,000 distinct types involved; look at the return types of `Push`. — stakx - no longer contributing, Aug 14 '11 at 22:04
Not so sure about exponential but there sure are odds for O(n^3). Nothing practical, if you hope to gain insight in how the generic type implementation works then take a looks at the SSCLI20 source code. — Hans Passant, Aug 14 '11 at 22:18
@Hans, I'll follow your advice and take a look at the SSCLI (Rotor) source code. If I find something, I'll post again here. — stakx - no longer contributing, Aug 18 '11 at 22:41
I don`t anything unexpected here. The more you push, the more closed generic type .net have to generate. I think it is becoming slower because everytime it generates classes from the start, not just closing one last generic again. — Vladimir Perevalov, Aug 20 '11 at 10:39
You seam to be torturing the type system in order to have a static compile time bound on the depth of the stack. Is this actually useful? — Damien_The_Unbeliever, Aug 20 '11 at 17:59
@Damien: Sorry for being somewhat idealistic here, but why should I not torture the type system? It's there for a reason (catching errors at compile time for example), so why not make full use of it? Of course, as it turns out, the run-time type system implementation of .NET doesn't seem to be powerful enough for that little immutable stack experiment that I did. Still: It *could* be powerful enough, I suppose. — stakx - no longer contributing, Aug 22 '11 at 16:12
Suppose it were possible to have Microsoft change .net in a way that increased a hundredfold the speed at which 50+-deep generic types are created, but all other operations would be slowed down by 0.1%. Would such a change be a good or a bad thing? — supercat, Oct 08 '11 at 00:11
I've just added benchmark code and results to my question, if anyone is still interested. — stakx - no longer contributing, Feb 22 '12 at 22:04
As far as I can see, the only thing that's being created at run-time is a new Type object for each new object, and that's pretty much limited to a new name. I can't see why creating 2000 objects would take much longer than creating 1000. (However, the Name property of those Type object might get very long-- I'm guess about 20K characters for the last few -- which might cause trouble) — James Curran, Aug 24 '11 at 21:09
No system can be all things to all people. The type system works well on non-pathological code constructs and *works* for even a pathological construct. I think @matthias is close to the correct answer: some optimization step is being forced to do more and more work because of the nesting of types. (Which isn't to say that this shouldn't be handled better, but that the current tradeoffs being made are simply not working out in this case. As more functional code hits the .NET runtime, this kind of code might even be accommodated more readily.) — Godeke, Feb 24 '12 at 18:44

score 2 · Accepted Answer · answered Aug 25 '11 at 12:47

Accessing a new type causes the runtime to recompile it from IL to native code (x86 etc). The runtime also optimizes the code, which will also produce different results for value types and reference types.

And List<int> clearly will be optimized differently than List<List<int>>.

Thus also EmptyStack<int> and NonEmptyStack<int, EmptyStack<int>> and so on will be handled as completely different types and will all be 'recompiled' and optimized. (As far as I know!)

By nesting further layers the complexity of the resulting type grows and the optimization takes longer.

So adding one layer takes 1 step to recompile and optimize, the next layer takes 2 steps plus the first step (or so) and the 3rd layer takes 1 + 2 + 3 steps etc.

score 0 · Answer 2 · 2011-08-25T00:18:22.720

If James and other people are correct about types being created in runtime, then performance is limited by speed of types creation. So, why speed of types creation is exponentially slow ? I think, that by definition, types are different to each other. Consequently, every next type causes series of increasingly different memory allocation and deallocation patterns. The speed is simply limited by how efficient is automatic managing of memory by a GC. There are some agressive sequencies, which will slow down any memory manager, no matter how good it is. GC and allocator will spend more and more time looking for optimally sized pieces of free memory for every next allocation and size.

Answer:

Because, you found one very agressive sequence, which fragments memory so bad and so fast, that GC is confused to no means.

What one can learn from it, is that: really fast real world apps (for example: Algorithmic Stock Trading apps) are very plain pieces of straight code with static data structures, allocated once only for the whole run of application.

score 0 · Answer 3 · edited May 23 '17 at 10:32

0

In Java, computation time appears to be a little more than linear and far more efficient than you're reporting in .net. Using the testRandomPopper method from my answer, it takes ~4 seconds to run with N=10,000,000 and ~10 seconds to run with N=20,000,000

edited May 23 '17 at 10:32

Community

1
1

answered Feb 22 '12 at 17:01

Jeff Axelrod

27,676
31
147
246

This may be an interesting side note, but it doesn't attempt to answer the question. – kvb Feb 23 '12 at 20:47
1

@kvb My answer was intended to answer his question numbered 3: `Are CLR implementations other than .NET (Mono, Silverlight, .NET Compact etc.) known to exhibit the same characteristics?` Although, in fairness, Java is JVM not CLR. – Jeff Axelrod Feb 23 '12 at 21:18
3

Right, the JVM doesn't have reified generics, so it's not an apples-to-apples comparison. – kvb Feb 23 '12 at 21:22
@kvb That's a good point--if I'm not mistaken, my Java code is almost line-for-line identical to the .NET code and obviously can't rely on reified types. So why doesn't the .net compiler optimize the reification data out? Or am I missing something? – Jeff Axelrod Feb 24 '12 at 05:26
1

At runtime, you could call `GetType()` on one of the instances and you'll get the full constructed type in .NET. In Java, you just get the unparameterized type. Likewise, you could use casts to subvert the type safety of your code in Java (pretending a stack is longer than it is), then get an exception some time later when popping. However, since the types are maintained at runtime in .NET, trying such a cast would fail immediately. – kvb Feb 24 '12 at 06:28
Well that does make sense then that the compiler wouldn't optimize it out. Maybe I'll implement it with Google Guice's `TypeLiteral` passed in (and stored along with a get method) which gives you Scala-like type manifests and see if it affects performance. Would you consider this functionally equivalent? – Jeff Axelrod Feb 24 '12 at 13:53

score -4 · Answer 4 · answered Aug 25 '11 at 03:41

Is there a desperate need to have a distinction between the empty stack and the non-empty stack?

From a practical point of view you can't pop the value of an arbitrary stack without fully qualifying the type and after adding 1,000 values that's an insanely long type name.

Why not just do this:

public interface IImmutableStack<T>
{
    T Top { get; }
    IImmutableStack<T> Pop { get; }
    IImmutableStack<T> Push(T x);
}

public class ImmutableStack<T> : IImmutableStack<T>
{
    private ImmutableStack(T top, IImmutableStack<T> pop)
    {
        this.Top = top;
        this.Pop = pop;
    }

    public T Top { get; private set; }
    public IImmutableStack<T> Pop { get; private set; }

    public static IImmutableStack<T> Push(T x)
    {
        return new ImmutableStack<T>(x, null);
    }

    IImmutableStack<T> IImmutableStack<T>.Push(T x)
    {
        return new ImmutableStack<T>(x, this);
    }
}

You can pass around any IImmutableStack<T> and you only need to check for Pop == null to know you've hit the end of the stack.

Otherwise this has the semantics you're trying to code without the performance penalty. I created a stack with 10,000,000 values in 1.873 seconds with this code.

This does not answer the question at all. The question is not about implementing a stack. — Mormegil, Aug 25 '11 at 12:04
@Mormegil - I appreciate that the question was asking some deeper questions about the CLR, but it's not clear if the OP want these answers to know if he can make his stack work or if he needs to find alternative answer. I just went straight to the alternative because I could see that this kind of class was going to kill the CLR and destroy all benefit of an immutable stack. — Enigmativity, Aug 25 '11 at 12:34
@Enigmativity: Of course the distinction between empty and non-empty stacks is not necessary. I freely admit that perhaps it is not even particularly useful in production code. The more common approach would be to have a Boolean `IsEmpty` property and specify a pre-condition `!IsEmpty` for the `Pop` operation. However: Shifting this property into the type system allows you to do certain checks at compile-time instead of at run-time, which I find quite interesting in its own right. — stakx - no longer contributing, Aug 25 '11 at 18:55

Instantiation of recursive generic types slows down exponentially the deeper they are nested. Why?

Update #1: Implementation of the above interfaces

Update #2: Benchmark code and results

4 Answers4

Linked