Why LINQ method Any does not check Count?

Question

If we look at source code of extension method Any we will se, that it always uses enumerator:

public static bool Any<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
        throw Error.ArgumentNull(nameof (source));
    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
        if (enumerator.MoveNext())
            return true;
    }
    return false;
}

I think, is't it better (for perfomace) to check Count property if collection is IList like in SingleOrDefault method for example:

public static TSource SingleOrDefault<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
        throw Error.ArgumentNull(nameof(source));
    IList<TSource> sourceList = source as IList<TSource>;
    if (sourceList != null)
    {
        switch (sourceList.Count)
        {
            case 0:
                return default(TSource);
            case 1:
                return sourceList[0];
        }
    }
    else
    {
        //...
    }
    throw Error.MoreThanOneElement();
}

I say, it can looks like this:

private static bool Any<TSource>(IEnumerable<TSource> source)
{
    if (source == null)
        throw new ArgumentNullException(nameof(source));

    IList<TSource> sourceList = source as IList<TSource>;

    if (sourceList != null)
    {
        return sourceList.Count != 0;
    }

    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
        if (enumerator.MoveNext())
            return true;
    }
    return false;
}

I wrote a benchmark to test it:

namespace AnyTests
{

    class Program
    {
        static void Main(string[] args)
        {
            BenchmarkRunner.Run<Test>();
        }
    }

    public class Test
    {
        private readonly List<int> list1 = new List<int>(new[] { 1, 2, 3, 4, 5 });

        private readonly IEnumerable<int> list2 = GetCollection();

        private static IEnumerable<int> GetCollection()
        {
            yield return 1;
        }

        [Benchmark]
        public void TestLinqAnyList()
        {
            Enumerable.Any(list1);
        }

        [Benchmark]
        public void TestNewAnyList()
        {
            NewAny(list1);
        }

        [Benchmark]
        public void TestLinqAnyEnumerable()
        {
            Enumerable.Any(list2);
        }

        [Benchmark]
        public void TestNewAnyEnumerable()
        {
            NewAny(list2);
        }


        private static bool NewAny<TSource>(IEnumerable<TSource> source)
        {
            if (source == null)
                throw new ArgumentNullException(nameof(source));

            IList<TSource> sourceList = source as IList<TSource>;

            if (sourceList != null)
            {
                return sourceList.Count != 0;
            }

            using (IEnumerator<TSource> enumerator = source.GetEnumerator())
            {
                if (enumerator.MoveNext())
                    return true;
            }
            return false;
        }
    }
}

Results show that it's about two times better:

// * Summary *

BenchmarkDotNet=v0.10.13, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.192)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical cores and 4 physical cores
Frequency=3515624 Hz, Resolution=284.4445 ns, Timer=TSC
  [Host]     : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0
  DefaultJob : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0


                Method |     Mean |     Error |    StdDev |
---------------------- |---------:|----------:|----------:|
       TestLinqAnyList | 26.80 ns | 0.1382 ns | 0.1154 ns |
        TestNewAnyList | 12.75 ns | 0.0480 ns | 0.0426 ns |
 TestLinqAnyEnumerable | 18.03 ns | 0.0947 ns | 0.0886 ns |
  TestNewAnyEnumerable | 23.51 ns | 0.0913 ns | 0.0762 ns |

For IList it's about twice better, for IEnumerable it's about 20% worse.

So, the question: what the reason to use optimization in SingleOrDefault method and do not use it in Any one?

There is an issue in .net core repository about that: https://github.com/dotnet/corefx/issues/23700. You can read first-hand information there about benefits and drawbacks of that (in particular, Jon Hannas' comment). You will find out some surprising things, for example that for arrays - current version is actually faster than checking `ICollection.Count` (while for lists it's a bit slower). — Evk, Apr 05 '18 at 07:02

mjwills · Accepted Answer · 2018-04-05T03:23:24.820

6

The assumption behind your question is likely:

Count is fast, why not use it?

One plausible answer of why Any doesn't use it is that Count is not always fast. The advantage of the implementation they chose is that it will have a relatively stable and low cost (i.e. roughly O(1)). It may not be fast as Count in all instances however (as you have identified).

There is no guarantee that the class that implements IList or ICollection will have a fast Count property. ConcurrentDictionary, for example, is generally slower for Count > 0 than the existing Any implementation.

Additionally, your code that uses IList should likely use ICollection since your code doesn't need the extra features that IList provides access to.

edited Apr 05 '18 at 03:23

answered Apr 05 '18 at 02:23

mjwills

23,389
6
40
63

1

Spot on. LINQ is built for IEnumerable, not IList. However I would think it would be trivial to support the most common collection classes in the .NET framework. – theMayer Apr 05 '18 at 02:26
In a framework, alas every few things are trivial. :( https://blogs.msdn.microsoft.com/ericgu/2004/01/12/minus-100-points/ – mjwills Apr 05 '18 at 02:30
But anyway, `SingleOrDefault` uses `Count`. Why, if it can be slow? Just don't understand where it's ok to use "slow" methods, and where not – Backs Apr 05 '18 at 02:45
Yep, and that means that `SingleOrDefault` would be slow for any type that implements `IList` and has a slow `Count` property @Backs . The nature of LINQ is that it is a slightly leaky abstraction - for each method they are making tradeoffs. Something they get the tradeoffs 'right', sometimes 'wrong' (e.g. https://stackoverflow.com/a/47631641/34092). It is hard to build this kind of stuff since the interface (`IList`, `ICollection` or whatever) tells you whether something is **possible**, but not its performance cost. The benefit of existing `Any` is that it has a relatively stable cost. – mjwills Apr 05 '18 at 03:16
@mjwills I've wrote small post about this problem: http://blog.rogatnev.net/2018/06/16/Any-vs-Count.html – Backs Jun 16 '18 at 16:55

Why LINQ method Any does not check Count?

1 Answers1