Delegate caching behavior changes in Roslyn

Question

Given the following code:

public class C
{
    public void M()
    {
        var x = 5;
        Action<int> action = y => Console.WriteLine(y);
    }
}

Using VS2013, .NET 4.5. When looking at the decompiled code, we can see that the compiler is caching the delegate at the call site:

public class C
{
    [CompilerGenerated]
    private static Action<int> CS$<>9__CachedAnonymousMethodDelegate1;
    public void M()
    {
        if (C.CS$<>9__CachedAnonymousMethodDelegate1 == null)
        {
            C.CS$<>9__CachedAnonymousMethodDelegate1 = new Action<int>(C.<M>b__0);
        }
        Action<int> arg_1D_0 = C.CS$<>9__CachedAnonymousMethodDelegate1;
    }
    [CompilerGenerated]
    private static void <M>b__0(int y)
    {
        Console.WriteLine(y);
    }
}

Looking at the same code decompiled in Roslyn (using TryRoslyn), yields the following output:

public class C
{
    [CompilerGenerated]
    private sealed class <>c__DisplayClass0
    {
        public static readonly C.<>c__DisplayClass0 CS$<>9__inst;
        public static Action<int> CS$<>9__CachedAnonymousMethodDelegate2;
        static <>c__DisplayClass0()
        {
            // Note: this type is marked as 'beforefieldinit'.
            C.<>c__DisplayClass0.CS$<>9__inst = new C.<>c__DisplayClass0();
        }
        internal void <M>b__1(int y)
        {
            Console.WriteLine(y);
        }
    }
    public void M()
    {
        Action<int> arg_22_0;
        if (arg_22_0 = C.<>c__DisplayClass0.CS$<>9__CachedAnonymousMethodDelegate2 == null)
        {
            C.<>c__DisplayClass0.CS$<>9__CachedAnonymousMethodDelegate2 =
                            new Action<int>(C.<>c__DisplayClass0.CS$<>9__inst.<M>b__1);
        }
    }
}

We can now see that the delegate is now lifted into a private class inside C, a similar behavior that we're used to seeing when closing over an instance variable / field (closure).

I know this is an implementation detail which may be subject to change at any given time.

Still I wonder, what are the benefits of lifting the delegate into a new class and caching it there over simply caching it at the call site?

Edit:

This issue talks about the same behavior as asked here.

[Almost but not quite a dupe](http://stackoverflow.com/questions/27725939/why-the-compiler-adds-an-extra-parameter-for-delegates-when-there-is-no-closure/27726206#27726206): the behaviour has changed yet again? — , Jun 17 '15 at 17:34
@hvd Not quite, this isn't caused by a bug, this is the behavior for any given cached delegate in Roslyn. — Yuval Itzchakov, Jun 17 '15 at 17:35
"Bug report" != "bug" :) Never claimed there was a bug in that other question nor here. — , Jun 17 '15 at 17:37
It seems that question is more about bound parameter optimization. I think this is a bit more general, since a decision was made to change the implementation behavior. It might be related.. — Yuval Itzchakov, Jun 17 '15 at 17:40
The compiler should move the delegate instantiation to the static ctor as well. No need to run it each time `M` runs. — usr, Jun 17 '15 at 18:04
@YuvalItzchakov that is true but the branch happens each time. This is not required. A beforefieldinit ctor is usually run before JIT time by the both JITs that we have right now. — usr, Jun 17 '15 at 18:38

score 29 · Accepted Answer · answered Jun 17 '15 at 20:22

29

Yes. The most important part is that the method containing lambda implementation is now an instance method.

You can see a delegate as a middleman receiving an instance call through Invoke and dispatching that call according to the calling convention of the implementing method.

Note that there are platform ABI requirements that specify how arguments are passed, how results are returned, what arguments are passed via registers and in which ones, how "this" is being passed and so on. Violating these rules may have bad impact on tools that rely on stack-walking, such as debuggers.

Now, if the implementing method is an instance method, the only thing that needs to happen inside the delegate is to patch "this", which is the delegate instance at the time of Invoke, to be the enclosed Target object. At that point, since everything else is already where it needs to be, the delegate can jump directly to the implementing method body. In many cases this is noticeably less work than what would need to happen if the implementing method was a static method.

answered Jun 17 '15 at 20:22

VSadov

961
8
4

@VSdov Thanks for the insightful explanation. If invoking an instance method less work, is there any special reason the compiler team went through the hoops of caching it as a *static* method at the callsite? – Yuval Itzchakov Jun 18 '15 at 05:34
2

@YuvalItzchakov - if the question is "why the original implementaiton used static methods" - I do not know for sure. I heard that the very original implementation of delegates was not very smart or perfomran and they have been improved over time. If that is really the case, it might be that originally instance/static was not making enough difference and the team went with the choice that was simpler to implement. – VSadov Jun 19 '15 at 20:43
1

But that seems contradictory to what you said in your answer. I'd assume that if the current Roslyn implementation is the easier between the two, than why originally use a static method that would make them need to jump through hoops to make it work? Or did I get that wrong? – Yuval Itzchakov Jun 19 '15 at 21:04
4

Current implementation is easier for CPU. It is not easier to implement :-). In addition to the call site caching there is also a singleton display class instance to deal with. There is a static constructor. Not a lot extra, but there are definitely more hoops to jump through. – VSadov Jun 21 '15 at 17:24
Thanks alot for the insights. – Yuval Itzchakov Jun 21 '15 at 17:41
Also, your sample may not show the typical picture. For a more general codegen, you should try invoking the action after creating it. Otherwise optimizer can elide dead assignments where it is easy to see there are no observable sideeffects. – VSadov Jun 21 '15 at 17:53

score 18 · Answer 2 · answered Jun 17 '15 at 16:50

18

Still I wonder, what are the benefits of lifting the delegate into a new class and caching it there over simply caching it at the call site?

You've missed one other really important detail - it's now an instance method. I believe that's the key here. IIRC, it was found that invoking a delegate which was "backed" by an instance method was faster than invoking a delegate backed by a static method - which is the motivation behind the change.

This is all hearsay, vaguely remembered from spending time with Dustin Campbell and Kevin Pilch-Bisson (both from the Roslyn team) at CodeMash, but it would make sense given the code you've shown.

(I haven't validated the performance difference for myself, and it sounds like it's backwards... but CLR internals can be funny like that...)

answered Jun 17 '15 at 16:50

Jon Skeet

1,421,763
867
9,128
9,194

1

That is very interesting. It would be odd that the creation of a class with an instance method would be faster than invoking a static method. I've always read that by emitting a `call` instead of a `callvirt`, you should be saving a couple of milliseconds. – Yuval Itzchakov Jun 17 '15 at 16:55
1

@YuvalItzchakov: Yes, but this won't be calling `call` or `callvirt` - it'll be doing delegate invocation magic. As I say, I don't know the details of *why* it's faster. Have pinged Dustin and Kevin on Twitter - they may well have a good explanation :) – Jon Skeet Jun 17 '15 at 16:57
2

@YuvalItzchakov Milliseconds is the wrong unit of time there. You'd be talking nanoseconds, not milliseconds. – Servy Jun 17 '15 at 16:57
1

@Servy Yeah well, you get what I meant. – Yuval Itzchakov Jun 17 '15 at 16:58
I heard this as well. I feel this should be fixed at the CLR level. Once it's fixed there this new behavior of the C# compiler will likely result in slower binaries because the effect reverses. Well, the JIT is not receiving much perf investment so this might never actually happen. – usr Jun 17 '15 at 18:14
20

The reason it's faster is because delegate invokes are optimized for instance methods and have space on the stack for them. To call a static method they have to shift parameters around. – Kevin Pilch Jun 17 '15 at 18:21
3

@Kevin Interesting. Anywhere we can read on that behavior? Perhaps something more in depth? – Yuval Itzchakov Jun 17 '15 at 18:27

Delegate caching behavior changes in Roslyn

2 Answers2

Linked