8

I designed the following test:

var arrayLength=5000;
object[] objArray=new object[arrayLength];

for(var x=0;x<arrayLength;x++)
{
    objArray[x]=new object();
}
objArray[4000]=null;
const int TestSize=int.MaxValue;

System.Diagnostics.Stopwatch v= new Stopwatch();
v.Start();
for(var x=0;x<10000;x++)
{
    objArray.Contains(null);
}
v.Stop();
objArray.Contains(null).Dump();
v.Elapsed.ToString().Dump("Contains");

//Any ==
v.Reset();
v.Start();
for(var x=0;x<10000;x++)
{
    objArray.Any(o=>o==null);
}
v.Stop();
objArray.Any(x=>x==null).Dump();
v.Elapsed.ToString().Dump("Any");

//Any Equals
v.Reset();
v.Start();
for(var x=0;x<10000;x++)
{
    objArray.Any(obj=>object.Equals( obj,null));
}
v.Stop();
objArray.Any(obj=>object.Equals( obj,null)).Dump();
v.Elapsed.ToString().Dump("Any");

The results when null is not present:

  • Contains False 00:00:00.0606484
  • Any == False 00:00:00.7532898
  • Any object.Equals False 00:00:00.8431783

When null is present at element 4000:

  • Contains True 00:00:00.0494515
  • Any == True 00:00:00.5929247
  • Any object.Equals True 00:00:00.6700742

When null is present at element 10:

  • Contains True 00:00:00.0038035
  • Any == True 00:00:00.0025687
  • Any True 00:00:00.0033769

So when the object is near the front, Any is slightly faster; when it's at the back, it's much much slower. Why?

Michael Myers
  • 188,989
  • 46
  • 291
  • 292
Maslow
  • 18,464
  • 20
  • 106
  • 193

4 Answers4

8

Any will have to call a delegate for every element it checks (an extra callvirt instruction which is unlikely to get inlined by the JIT). Contains only performs that check. That's why Any is slower. I suspect the fact that Any looks faster than contains when the element is seen very early is that the benchmark can't reflect it easily since they are very close. The setup time for the method call is the majority of the work done in that case (rather than the actual searching operation).

The anonymous method:
--- C:\Users\Mehrdad\AppData\Local\Temporary Projects\ConsoleApplication1\Program.cs 
            Console.WriteLine(s.Any(a => a == 1));
00000000  xor         eax,eax 
00000002  cmp         ecx,1 
00000005  sete        al 
00000008  ret 

Relevant part of Enumerable.Any code:
...
00000051  mov         edx,eax 
00000053  mov         rcx,qword ptr [rbx+8] 
00000057  call        qword ptr [rbx+18h]   // calls the anonymous method above
0000005a  movzx       ecx,al 
0000005d  test        ecx,ecx 
...
Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • 1
    Are you sure? Especially if you're on x64, I highly suspect that unless you use a closure, that function will be inlined. – Ana Betts Feb 05 '10 at 15:56
  • 1
    @Paul: It's a delegate call. If they want to inline it, they'll have to generate different code for every call to `Enumerable.Any`. I'll check though. – Mehrdad Afshari Feb 05 '10 at 15:59
  • @Paul: Checked with .NET 4 Beta 1 x64 Release. Not inlined. – Mehrdad Afshari Feb 05 '10 at 16:09
  • Interesting. Where's Eric Lippert when we need him? – Ana Betts Feb 05 '10 at 16:12
  • @Paul: Why? Did he claim it'll get inlined? -- Typo btw:This is beta 2, not 1. – Mehrdad Afshari Feb 05 '10 at 16:17
  • Delegates are like interfaces, calls to them can not get inlined. – Steven Feb 05 '10 at 16:17
  • 4
    @Paul, I don't think you need me for this one. Except of course to point out that all this speculation is *speculation* -- well-informed, reasonable speculation, but nevertheless, speculation. If you want to know where the time is being spent, *run a profiler*. – Eric Lippert Feb 05 '10 at 18:02
  • @Mehrdad - no, only that I usually end up learning something interesting about the CLR/C# compiler whenever I read something he wrote. I suppose it does make sense that this would be difficult to inline if you think about it. @Eric - Very true, and these days you'd do far better to minimize disk I/O and page faults than to try to play 80s-style "this is 60 less CPU cycles!"-style optimization – Ana Betts Feb 05 '10 at 18:12
4

Any is slower because Contains is tailored to the specific container you're using (Array/List/etc), so it doesn't have the overhead of firing up an IEnumerable, calling MoveNext() all the time, etc.

However, using Any will make refactoring easier since if you change that collection, you can stil use it, so really I'd only change it to Contains if you know via a profiler that this is an important piece of code. And if it is, you should probably end up using a smarter data structure anyways like a HashSet, since both Any and Contains are O(n).

Ana Betts
  • 73,868
  • 16
  • 141
  • 209
  • C# has *many* places where it adds an extra layer of indirection in order to handle enumeration. I have on occasion had cases where my application was much faster when I avoided this layer of indirection. However, generally that will only happen if you have loops where you're doing many very short iterations (e.g. a well-optimized implementation of Conway's Game of Life). – Brian Feb 05 '10 at 15:59
4

As other people have already noted, both the Contains and Any methods are extension methods of Enumerable. The big difference in performance has a couple of reasons:

First of all, you supply a delegate to the Any, which has to be called for every object, while the Contains method doesn't have to. Delegate calls are about as fast as a call to an interface method. For this reason, Any is slower.

Next, something that other people seem to have missed, the Contains extension method has a performance optimization for collections implementing ICollection. Because object[] implements ICollection the extension method call results in a method call on the array itself. Internally, this array.Contains method uses a simple for loop to iterate over the array to compare the value. This means array bounds checking is done just once iterating the array.

Because the Any method must call your delegate, a performance optimization as with the Contains method is not possible. This means that the Any method iterates over the collection using the IEnumerable interface, which leads to a interface call + an array bounds check + a delegate call on each and every element. Compare that to the array.Contains, where there are no interface calls, no delegate calls and a single bounds check.

[Update]: One last note. The reason that Any is faster with small collections (and in your case with a null value at the start of the collection) has to do with the cast to ICollection that the Enumerable.Contains does When you do the cast to ICollection yourself, you'll see that the call to Contains is faster than Any:

for(var x=0;x<10000;x++)
{
    ICollection<object> col = objArray;
    col.Contains(null);
}
Steven
  • 166,672
  • 24
  • 332
  • 435
2

I'm guessing it has to do with the fact that Any is an extension method, part of the LINQ library and involves using delegates (via the Func<> syntax). Any time you have to call out to a seperate method (especially as a delegate) it will slow down.

  • that's precisely the reason. If we were dealing with a more complicated predicate other than a == b then the overhead of the delegate would become less and less as the complexity of the predicate increases. In this case a simple reference check is hampered by the stack push/pop of the delegate call. I'm also guessing that because it's a delegate the JIT cannot inline it into the loop. If it could - then I would have thought the results will be the same. – Andras Zoltan Feb 05 '10 at 15:54
  • Welcome to StackOverflow, Adam. Nice first answer. – Robert Harvey Feb 05 '10 at 15:58