1

I was curious to see what the performance differences between returning a value from a method, or returning it through an Action parameter.

There is a somewhat related question to this Performance of calling delegates vs methods

But for the life of me I can't explain why returning a value would be ~30% slower than calling a delegate to return the value. Is the .net Jitter (not compiler..) in-lining my simple delegate (I didn't think it did that)?

class Program
{
    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();

        A aa = new A();

        long l = 0;
        for( int i = 0; i < 100000000; i++ )
        {
            aa.DoSomething( i - 1, i, r => l += r );
        }

        sw.Stop();
        Trace.WriteLine( sw.ElapsedMilliseconds + " : " + l );

        sw.Reset();
        sw.Start();

        l = 0;
        for( int i = 0; i < 100000000; i++ )
        {
            l += aa.DoSomething2( i - 1, i );
        }

        sw.Stop();
        Trace.WriteLine( sw.ElapsedMilliseconds + " : " + l );
    }
}
class A
{
    private B bb = new B();

    public void DoSomething( int a, int b, Action<long> result )
    {
        bb.Add( a,b, result );
    }

    public long DoSomething2( int a, int b  )
    {
        return bb.Add2( a,b );
    }

}
class B
{
    public void Add( int a, int b, Action<long> result )
    {
        result( a + b );
    }

    public long Add2( int i, int i1 )
    {
        return i + i1;
    }
}
Community
  • 1
  • 1
headsling
  • 623
  • 3
  • 13
  • 1
    I can't reproduce this. For me the `DoSomething2` method is about twice as fast. I tested this in .NET 4.0 with an x86 build. The IL does not contain any inlining yet, and it also seems reasonable that `DoSomething` is slower due to the additional `Action.Invoke` method call. – Dirk Vollmar Sep 16 '10 at 15:22
  • I can't reproduce it either. On my machine `DoSomething2` is also significantly faster than `DoSomething`. – Dan Tao Sep 16 '10 at 15:26
  • 1
    Debug vs. Release, probably. For me, the results are opposite in those two builds. – Anthony Pegram Sep 16 '10 at 15:27
  • For me, even in the Debug version `DoSomething2` is faster. However, the benchmark is comparing different things; `DoSomething` should call the delegate directly and not yet another method to make this a fair benchmark. – Dirk Vollmar Sep 16 '10 at 15:30
  • .net 4.0 seems to have changed things - i'll give that a go – headsling Sep 17 '10 at 12:25

4 Answers4

2

I made a couple of changes to your code.

  • Moved new A() before the timed section.
  • Added warmup code before the timed section to get the methods JIT'ed.
  • Created an Action<long> reference before the timed section and loop so that it does not have to be created on each iteration. This one seemed to have a big impact on execution time.

Here are my results after making the above changes. The vshost column indicates whether the code was executing inside the vshost.exe process (by running directly from Visual Studio). I was using Visual Studio 2008 and targeted .NET 3.5 SP1.

vshost?   Debug   Release
-------------------------
 YES       6405     3827
          11059     3092

 NO        4214     1691
           4607      811

Notice how you get different results depending on the build configuration and the execution environment. The results are interesting if nothing else. If I get time I might edit my answer to provide a theory.

Brian Gideon
  • 47,849
  • 13
  • 107
  • 150
  • warmup code wasn't making much difference in my numbers to i removed them for brevity. I specifically left the action lambda's in as i wanted to see what the cost would be on both times and memory usage (very little difference!) I really have to remember to test outside of VS! cheers – headsling Sep 17 '10 at 12:21
  • 1
    @headsling: The warmup code made no difference for me either. I sort of expected that. But, I did see a significant difference by lifting the action delegate outside of the loop. That now makes me wonder...could that be one of the optimizations performed anyway in a release build? Lifting instructions outside of loops is not new so it is reasonable. – Brian Gideon Sep 17 '10 at 12:35
  • interesting - i agree that it seems likely that the release build might lift my action delegate out... i'll have a play with that – headsling Sep 20 '10 at 20:37
1

Strangely, I'm not seeing the behavior you're describing when running a Release build in VS. I am seeing it when running a Debug build. The only thing I can figure is that there's added overhead with the return-based approach when running the Debug build, though I'm not clever enough to see why.

Here's something else that's interesting: this discrepancy disappears when I switch to a x64 build (Release or Debug).

If I were to venture a guess (completely unsubstantiated), it might be that the cost of passing the 64-bit long as a return value in both B.Add2 and A.DoSomething2 outweighs that of passing the Action<long> in a 32-bit environment. In a 64-bit environment, this savings would vanish as the Action<long> would require 64 bits as well. In a Release build in either configuration, the cost of passing the long probably disappears as both B.Add2 and A.DoSomething2 seem like prime candidates for inlining.

Somebody who knows way more about this than I do: feel free to totally refute everything I just said. We're all here to learn, after all ;)

Dan Tao
  • 125,917
  • 54
  • 300
  • 447
  • 1
    The reason that you get that result in the Debug build is actually not that the method call is slower than calling the delegate. The result is simply flawed because the overall overhead for loading and jitting is much higher in the Debug version. To fix this, you would have to add some warm-up code so that everything is already loaded and jitted *before* you start measuring. – Dirk Vollmar Sep 16 '10 at 15:33
  • You could verify your guess by simply replacing `long` with `int`. What I'd rather assume though is that the 64-bit jitter is able to perform some optimization such as inlining which the 32-bit jitter does not apply. As far as I know, the 64-bit and 32-bit runtime have been developed separately. – Dirk Vollmar Sep 16 '10 at 15:42
  • @0xA3: Yeah, my hypothesis didn't seem to hold up, sadly (for me). Changing `long` to `int` did not reverse the strange discrepancy noted by the OP (nor did adding warm-up code, actually, as you can also see from @Brian's answer). – Dan Tao Sep 16 '10 at 16:41
  • 1
    The oddest thing that stuck out for me was that the traditional return method was only marginally faster with a release build when ran by vshost.exe, but it is dramatically faster when ran standalone. I wonder if vshost runs the CLR with a flag that disables method inlining and perhaps other optimizations? – Brian Gideon Sep 16 '10 at 19:36
  • as i mentioned above - warm up code didn't make that large a difference so i left it out. – headsling Sep 17 '10 at 12:26
1

Well for starters your call to new A() is being timed the way you currently have your code set up. You need to make sure you're running in release mode with optimizations on as well. Also you need to take the JIT into account--prime all your code paths so you can guarantee they are compiled before you time them (unless you are concerned about start-up time).

I see an issue when you try to time a large quantity of primitive operations (the simple addition). In this case you can't make any definitive conclusions since any overhead will completely dominate your measurements.

edit: In release mode targeting .NET 3.5 in VS2008 I get:

1719 : 9999999800000000
1337 : 9999999800000000

Which seems to be consistent with many of the other answers. Using ILDasm gives the following IL for B.Add:

  IL_0000:  ldarg.3
  IL_0001:  ldarg.1
  IL_0002:  ldarg.2
  IL_0003:  add
  IL_0004:  conv.i8
  IL_0005:  callvirt   instance void class [mscorlib]System.Action`1<int64>::Invoke(!0)
  IL_000a:  ret

Where B.Add2 is:

  IL_0000:  ldarg.1
  IL_0001:  ldarg.2
  IL_0002:  add
  IL_0003:  conv.i8
  IL_0004:  ret

So it looks as though you're pretty much just timing a load and callvirt.

Ron Warholic
  • 9,994
  • 31
  • 47
  • Thanks for the comments - clearly release mode (which i didn't test) makes a difference in the results .. i'm still confused as to why debug mode makes a difference in this case. My test times are not materially effected by warm up (which i omitted for brevity) and the new A() given the order and the magnitude of the iterations. – headsling Sep 17 '10 at 12:20
-1

Why not use reflector to find out?

Kell
  • 3,252
  • 20
  • 19