Why are reference types "slower" when used as generic type arguments in .NET 5?

Question

Today I ran into this issue: When using reference types as type arguments for a outer generic type, other methods in nested types are slower by a factor ~10. It does not matter which types I use - all reference types seem to "slow" the code down. (Sorry for the title, maybe somebody can find a more suitable one.)

Tested with .NET 5/Release builds.

What am I missing?

EDIT 2:

I'll try to explain the problem a little bit further and cleanup the code. If you still want to see the old version, here is a copy:

https://gist.github.com/sneusse/1b5ee408dd3fdd74fcf9d369e144b35f

The new code illustrates the same issue with hopefully less distraction.

The class WthGeneric<T> is instantiated twice
The first instance uses any reference type as the type argument (here: object)
The second instance uses any value type as the type argument (here: long)
As both are instances of the same class both have the same method WhatIsHappeningHere
Neither of the instances uses the generic argument in any way.

This leads to the question: Why is the runtime of the same instance method 10x higher than the other one?

Output:

System.Object: 516,8448ms
System.Int64: 50,6958ms

Code:

using System;
using System.Diagnostics;
using System.Linq;

namespace Perf
{
    public interface IWthGeneric
    {
        int WhatIsHappeningHere();
    }
    
    // This is a generic class. Note that the generic
    // type argument 'T' is _NOT_ used at all!
    public class WthGeneric<T> : IWthGeneric
    {
        // This is part of the issue.
        // If this field is not accessed or moved *outside*
        // of the generic 'WthGeneric' class, the code is fast again
        // ** also with reference types **
        public static int StaticVar = 12;

        static class NestedClass
        {
            public static int Add(int value) => StaticVar + value;
        }

        public int WhatIsHappeningHere()
        {
            var x = 0;
            for (int i = 0; i < 100000000; i++)
            {
                x += NestedClass.Add(i);
            }
            return x;
        }
    }
    
    public class RunMe
    {
        public static void Run()
        {
            // The interface is used so nothing could ever get inlined.
            var wthObject  = (IWthGeneric) new WthGeneric<object>();
            var wthValueType = (IWthGeneric) new WthGeneric<long>();

            void Test(IWthGeneric instance)
            {
                var sw = Stopwatch.StartNew();
                var x  = instance.WhatIsHappeningHere();
                Console.WriteLine(
                    $"{instance.GetType().GetGenericArguments().First()}: " +
                    $"{sw.Elapsed.TotalMilliseconds}ms");
            }

            for (int i = 0; i < 10; i++)
            {
                Test(wthObject);
                Test(wthValueType);
            }
        }
    }
}

Did you try to take a look at the IL code? I guess it could explain a lot to you. — GrayCat, Apr 12 '21 at 16:22
@GrayCat IL won't explain this one. The thing is that the generic type argument is not used. Furthermore, since it is not used, it is not stored (so it won't be a cache locality issue, nor garbage collection), and it is not boxed. This appears to be an issue with the jitter. — Theraot, Apr 12 '21 at 16:27
I added the IL code but as @Theraot mentioned this might not be the issue here. — sneusse, Apr 12 '21 at 16:32
I built the same code with .NET Framework 4.8 and Core 3.1, the issue is still the same — sneusse, Apr 12 '21 at 16:38
It appears to happen with structs with generic reference types too. I tested with `ArraySegment` and it was fast, but `ArraySegment` was slow. I made a custom `struct F{public object A;}` that was fast. But `F — Theraot, Apr 12 '21 at 16:48
Another update: I removed the "generic" requirement for the problem. Any nested types behave in the same way. — sneusse, Apr 12 '21 at 17:16
Isn't the problem here that both versions are doing different things? One is doing only add, while the other calls a function, accesses a field on a class, and then adds? You could verify looking at IL of `WthGeneric`, not `RunMe`. — GrayCat, Apr 12 '21 at 17:17
@GrayCat yeah sure the two methods do different things. The issue is that one of them is slower when used with reference types as type arguments. But I'd expect them to perform with the same speed no matter which type argument is used (because the type argument isn't used at all). — sneusse, Apr 12 '21 at 17:24
"sure the two methods do different things. The issue is that one of them is slower" -- what? What are you comparing here? If a function call is slower than a variable? What references are you even talking about, there's no references used in your code. — Blindy, Apr 12 '21 at 17:45
Looks similar to [this issue](https://github.com/dotnet/runtime/issues/44457) I created recently: that also affects .NET 5, when using my generic `ArraySegment`- like structs in tight loops, and there is no difference in IL but in the JITted code. — György Kőszeg, Apr 12 '21 at 18:11
@Blindy Regarding the two methods: The performance of one of the methods (`Wth`) is sensitive to whether or not there are reference types generic arguments or not. While the performance of the other method (`NotSlow`) isn't. Remove `NotSlow` and this issue remains: why is `Wth` slower with a reference types generic arguments? — Theraot, Apr 12 '21 at 19:07
@GrayCat Ignore `NotSlow`. Read by previous comment. `Wth` is about an order of magnitud slower with a reference type generic argument. Even though the generic type parameter is not used. Why? `NotSlow` is just there to contrast, it does not have that problem. — Theraot, Apr 12 '21 at 19:10
I updated the question and removed the distracting method. I hope it's clearer now :) sorry! — sneusse, Apr 12 '21 at 20:54

Theraot · Answer 1 · 2021-04-12T21:23:29.400

I'm ready to say this is a jitter's fault. Perhaps "fault" is too strong word. The jitter does not optimize this case.

Using SharpLap to look at the JIT asm of this code:

using SharpLab.Runtime;

[JitGeneric(typeof(int))]
public class A<T>
{
    public static int X;

    public static class B
    {
        public static int C() => X;
    }
}

Note: The attribute JitGeneric(typeof(int)) is telling SharpLab to JIT this code with the generic argument int. Without a generic argument, it is not possible to JIT a generic type.

We get this:

; Core CLR v5.0.321.7212 on x86

A`1[[System.Int32, System.Private.CoreLib]]..ctor()
    L0000: ret

A`1+B[[System.Int32, System.Private.CoreLib]].C()
    L0000: mov ecx, 0x2051c600
    L0005: xor edx, edx
    L0007: call 0x5e646b70
    L000c: mov eax, [eax+4]
    L000f: ret

Try it online.

Meanwhile, for this code:

using SharpLab.Runtime;

[JitGeneric(typeof(object))]
public class A<T>
{
    public static int X;

    public static class B
    {
        public static int C() => X;
    }
}

Note: Yes, this is the same class, except now I'm telling SharpLap to JIT it for the generic argument object.

We get this:

; Core CLR v5.0.321.7212 on x86

A`1[[System.__Canon, System.Private.CoreLib]]..ctor()
    L0000: ret

A`1+B[[System.__Canon, System.Private.CoreLib]].C()
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: push eax
    L0004: mov [ebp-4], ecx
    L0007: mov edx, [ecx+0x20]
    L000a: mov edx, [edx]
    L000c: mov edx, [edx+8]
    L000f: test edx, edx
    L0011: je short L0015
    L0013: jmp short L0021
    L0015: mov edx, 0x2046cec4
    L001a: call 0x5e4e4090
    L001f: mov edx, eax
    L0021: mov ecx, edx
    L0023: call 0x5e4fa760
    L0028: mov eax, [eax+4]
    L002b: mov esp, ebp
    L002d: pop ebp
    L002e: ret

Try it online.

We observe that for the reference type generic argument, we get a much longer code. Is that code necessary? Well, we are accessing a public static field of a generic class. Let us see how that looks if the other class is not nested:

using SharpLab.Runtime;

public static class Bint
{
    public static int C() => A<int>.X;
}

public static class Bobject
{
    public static int C() => A<object>.X;
}

[JitGeneric(typeof(object))]
public class A<T>
{
    public static int X;
}

We get this code:

; Core CLR v5.0.321.7212 on x86

Bint.C()
    L0000: mov ecx, 0x209fc618
    L0005: xor edx, edx
    L0007: call 0x5e646b70
    L000c: mov eax, [eax+4]
    L000f: ret

Bobject.C()
    L0000: mov ecx, 0x209fc618
    L0005: mov edx, 1
    L000a: call 0x5e646b70
    L000f: mov eax, [eax+4]
    L0012: ret

A`1[[System.__Canon, System.Private.CoreLib]]..ctor()
    L0000: ret

Try it online.

Therefore, no, we don't need the long version of the code. We must conclude that the jitter is not optimizing this case appropriately.

I don't see how the second version is relevant: in this version the type is statically known and can therefore be inlined. But the first version use `System.__Canon` fake object type, and the type is not known in advance, so there is no way to optimize it — Charlieface, Apr 12 '21 at 20:32
@Charlieface the type is known at the time it is jitted, isn't it? We can agree that the jitter is not optimizing this case. To be fair, I don't know enough about the jitter to tell why, and thus I did not attempt to provide a rationale. — Theraot, Apr 12 '21 at 20:38
Thanks for the explanation and the link to SharpLab, I wasn't aware of this tool - awesome :) — sneusse, Apr 12 '21 at 20:43
As I say above, no, the jitter only generates one version for all reference types — Charlieface, Apr 13 '21 at 00:13

Charlieface · Accepted Answer · 2021-04-13T10:09:29.940

4

Not 100% sure, but I think I know why the JIT is not optimizing this:

As I understand it, every generic type generally only has one version of the JITted code for reference types, named System.__Canon, and the type parameter is passed in as an actual typeref parameter. Whereas for valuetypes each one is generated separately.

This is because a reference type always looks the same to the JIT: a pointer to an object which has its first field as a pointer to its typeref and methodtable. But valuetypes are all different, so each must be custom-built.

You say you don't use the type parameter, but actually you do. When you access a static field of a generic type, each instantiated generic type needs a separate copy of the static field.

So the code must now do a pointer lookup to the type parameter's typeref to get the static field's value.

But in the valuetype version, the typeref is statically known, therefore it's a straight memory access every time.

edited Apr 13 '21 at 10:09

answered Apr 12 '21 at 20:31

Charlieface

52,284
6
19
43

I want to point out that `ArraySegment – Theraot Apr 12 '21 at 20:48
Well, this would explain the behaviour. Do you have any references or a link to some writeup/code/...? – sneusse Apr 12 '21 at 20:50
@sneusse I found this article: [.NET Generics under the hood](https://alexandrnikitin.github.io/blog/dotnet-generics-under-the-hood/) which among other things says "Everything is pretty straightforward when you call a specialized(typed) generic method from a regular method. All checks and type lookups can be done during the compilation (inc JIT) phase. But things get tricky when you call a generic method from another generic method where you don’t know the type." – Theraot Apr 12 '21 at 21:15
Now what strikes me is that the jitter can optimize value types which "are all different", in the same scenarios. I don't think it is a matter of the type not being known. At JIT time, the type is known. The fact that it can be done for (some) value types is evidence. Instead is that the developers has chosen to not optimize cases of reference types generic arguments. Which I'll guess is a matter of trade-offs instead of viability. @sneusse – Theraot Apr 12 '21 at 21:19
1

@Theraot: Thanks, for the article! It seems the problem is indeed that all generic types with reference types a arguments share the same EEClass structure which holds the information about the fields. So the problem is not really a problem but a design choice to prefer memory > speed for generic types. Relevant source is here: https://github.com/dotnet/coreclr/blob/master/src/vm/generics.cpp – sneusse Apr 12 '21 at 22:01
@sneusse Not memory over speed. But JIT time over runtime. They could have had the JIT take extra time to figure out better native code, but then it is slow because the JIT is slow. Or they could have JIT emit native code quicker, but then it is slow because the native code is not optimal. Of course, the native code may run many times but only be jitted once. You may also be interested in RyuJIT, which should be able to replace a suboptimal jitted code with a better one in runtime… Er… does not seem to be happening either. Feature request? – Theraot Apr 12 '21 at 22:13
@Theraot I don't think you quite got the nuance of what I was saying. The enclosing generic type is not relevant, it is the instantiated type parameter, so `ArraySegment – Charlieface Apr 13 '21 at 00:11
Similarly, I'm not quite sure you understood me. I meant using `ArraySegment – Theraot Apr 13 '21 at 00:23
@Theraot Hum, interesting. I suppose any time there is a reference-type parameter then it will use `System.__Canon`, because it can equally be replaced by any other reference-type. – Charlieface Apr 13 '21 at 09:38

Why are reference types "slower" when used as generic type arguments in .NET 5?

2 Answers2

Linked