8

Consider this code:

static void FillUsingAsNullable()
{
  int?[] arr = new int?[1 << 24];
  var sw = System.Diagnostics.Stopwatch.StartNew();
  for (int i = 0; i < arr.Length; ++i)
    arr[i] = GetObject() as int?;
  Console.WriteLine("{0:N0}", sw.ElapsedTicks);
}

static void FillUsingOwnCode()
{
  int?[] arr = new int?[1 << 24];
  var sw = System.Diagnostics.Stopwatch.StartNew();
  for (int i = 0; i < arr.Length; ++i)
  {
    object temporary = GetObject();
    arr[i] = temporary is int ? (int?)temporary : null;
  }
  Console.WriteLine("{0:N0}", sw.ElapsedTicks);
}

static object GetObject()
{
//Uncomment only one:
  //return new object();
  //return 42;
  //return null;
}

As far as I can see, the methods FillUsingAsNullable and FillUsingOwnCode should be equivalent.

But it looks like the "own code" version is clearly faster.

There are 2 choices for compiling "x86" or "x64", and 2 choices for compiling "Debug" or "Release (optimizations)", and 3 choices for what to return in GetObject method. As far as I can see, in all of these 2*2*3 == 12 cases, the "own code" version is significantly faster than the "as nullable" version.

The question: Is as with Nullable<> unnecessarily slow, or am I missing something here (quite likely)?

Related thread: Performance surprise with “as” and nullable types.

Community
  • 1
  • 1
Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
  • 2
    "1 << 24" Why are you deliberately writing obtuse code? (The constant is 0x1000000.) – Steve Wellens Jan 27 '14 at 19:42
  • 5
    @SteveWellens Sorry, just replace with `0x1000000` or `16777216` or `20000000` or what ever you like. I sometimes use `1 << number` for powers of two ("two to the `number`th power"), but not as an attempt to make the code obscure for sure. – Jeppe Stig Nielsen Jan 27 '14 at 19:56

3 Answers3

2

The generated IL is different, but not fundamentally. If the JIT was good, which it is not and this is no news, this could compile to the exact same x86 code.

I compiled this with VS2010 Release AnyCPU.

as version:

L_0015: call object ConsoleApplication3.Program::GetObject()
L_001a: stloc.3 
L_001b: ldloc.0 
L_001c: ldloc.2 
L_001d: ldelema [mscorlib]System.Nullable`1<int32>
L_0022: ldloc.3 
L_0023: isinst [mscorlib]System.Nullable`1<int32>
L_0028: unbox.any [mscorlib]System.Nullable`1<int32>
L_002d: stobj [mscorlib]System.Nullable`1<int32>

?: version:

L_0015: call object ConsoleApplication3.Program::GetObject()
L_001a: stloc.3 
L_001b: ldloc.0 
L_001c: ldloc.2 
L_001d: ldelema [mscorlib]System.Nullable`1<int32>
L_0022: ldloc.3 
L_0023: isinst int32
L_0028: brtrue.s L_0036 //**branch here**
L_002a: ldloca.s nullable
L_002c: initobj [mscorlib]System.Nullable`1<int32>
L_0032: ldloc.s nullable
L_0034: br.s L_003c
L_0036: ldloc.3 
L_0037: unbox.any [mscorlib]System.Nullable`1<int32>
L_003c: stobj [mscorlib]System.Nullable`1<int32>

The descriptions of the opcodes are on MSDN. Understanding this IL is not difficult and anyone can do it. It is a little time-consuming to the inexperienced eye, though.

The main difference is that the version with the branch in the source code also has a branch in the generated IL. It is just a little less elegant. The C# compiler could have optimized this out if it wanted to, but the policy of the team is to let the JIT worry about optimizations. Would work fine if the JIT was getting then necessary investments.

You could analyze this further by looking at the x86 emitted by the JIT. You'll find an obvious difference but it will be an unspectacular discovery. I will not invest the time to do that.


I modified the as version to use a temporary as well to have a fair comparison:

            var temporary = GetObject();
            arr[i] = temporary as int?;
usr
  • 168,620
  • 35
  • 240
  • 369
  • 1
    I'm a little lost here, but doesn't this imply that the `as` should be faster, not slower as the OP said? – Jeff Jan 27 '14 at 20:40
  • @Jeff no because the JIT can do whatever it wants to. There's no direct relationship between in and out. In the meantime I looked at the IL and in the branching case the code is better. The `as` case seems to do a function call into the CLR to do the casting. In the `as` case, the JIT just compares a word from the object header to a constant (presumably the method table of Int32). Who knows why it does that. There's no deeper reason here. Maybe the branching version hit the right patterns in the optimizer. It is just not a very exhaustive optimizer (which I know from experience). – usr Jan 27 '14 at 20:46
  • Here's another case of the CLR JIT failing: http://stackoverflow.com/a/20399485/122718 Might be more fun to read because I'm going down to x86 in that answer. – usr Jan 27 '14 at 20:49
  • 1
    Isn't the problem with the `as` version IL above that it uses ``isinst [mscorlib]System.Nullable`1`` instead of simply `isinst int32` ... When I modify my code for `FillUsingOwnCode` to say `is int?` instead of `is int`, it becomes slower! Really, when we know we have a reference, possibly to a boxed value type, will it not be equivalent to test for `int32` and ``Nullable`1``? Does the runtime forget that boxing of ``Nullable`1`` is magical? – Jeppe Stig Nielsen Jan 27 '14 at 21:00
  • You are of course right that both are semantically identical. All variants that you tested are trivially semantically identical. The JIT is just failing to capitalize on that. The best answer that, IMO, can be given is: No, you are not missing anything. This should work but for reasons we *cannot* know it doesn't. Compilers are complex and the individual stages and transformations interact in complex ways. – usr Jan 27 '14 at 21:40
0

Yes, the as operator is for convenience not performance and as such slower.

For more information you can refer to this answer:

How does the "as" keyword work internally?

Community
  • 1
  • 1
Sham
  • 39
  • 7
0

I think your test results are being dominated by measurement error.

Here is my program:

    static void Main()
    {
        FillUsingAsNullable();
        FillUsingOwnCode();
        FillUsingAsNullable();
        FillUsingOwnCode();
        Console.ReadLine();
    }

Here is what I get running in Release, outside debugger, with return 42:

2,540,125
1,975,131
2,407,204
2,246,339

Note that there is a pretty wide variance between runs. You would probably need to run several more times in series to get good a good metric for performance.

Here is what happens when we change GetObject() to return (int?)42.

7,829,214
7,941,861
8,001,102
7,124,096

And again, with the same configuration:

8,243,258
7,114,879
7,932,285
7,268,167

If you really want to gather meaningful data, I suggest repeating the two tests in different orders, several times over and looking at the result mean and standard deviation.

I suspect the biggest time sink in these methods is memory cache invalidation, so the result for a given test is probably determined by where exactly the array winds up allocated for each test and when GC happens to kick in during the frequent boxing allocations.

Dan Bryant
  • 27,329
  • 4
  • 56
  • 102
  • Given that `static object G() { return (int?)42; }` and `static object G() { return 42; }` do return same thing, it is amazing that the former is slower than the latter! You are right about sources of error. Pretty sure my result is valid/significant, though. – Jeppe Stig Nielsen Jan 27 '14 at 21:17
  • I modified the method to return the tick count, and used code like `var rng = new Random(); long tot0 = 0L; long tot1 = 0L; int n0 = 0; int n1 = 0; for (int i = 0; i < 100; ++i) { if (rng.Next(2) == 0) { tot0 += FillUsingAsNullable(); ++n0; } else { tot1 += FillUsingOwnCode(); ++n1; } } Console.WriteLine("0 has average: " + (double)tot0 / n0); Console.WriteLine("1 has average: " + (double)tot1 / n1);`, and I still think my measurement was valid. – Jeppe Stig Nielsen Jan 27 '14 at 21:35
  • @Jeppe, boxing the nullable is more complex due to the branching for the `HasValue==false` case. It boxes to the same value in the end, but the process of performing the boxing takes more operations. This could theoretically be optimized due to it being a compile-time constant, but in practice this optimization wouldn't be very useful (since how often would you create a nullable constant?) – Dan Bryant Jan 27 '14 at 22:30