2

Background:

Linq-To-Objects has the extension method Count() (the overload not taking a predicate). Of course sometimes when a method requires only an IEnumerable<out T> (to do Linq), we will really pass a "richer" object to it, such as an ICollection<T>. In that situation it would be wasteful to actually iterate through the entire collection (i.e. get the enumerator and "move next" a whole bunch of times) to determine the count, for there is a property ICollection<T>.Count for this purpose. And this "shortcut" has been used in the BCL since the beginning of Linq.

Now, since .NET 4.5 (of 2012), there is another very nice interface, namely IReadOnlyCollection<out T>. It is like the ICollection<T> except that it only includes those member that return a T. For that reason it can be covariant in T ("out T"), just like IEnumerable<out T>, and that is really nice when item types can be more or less derived. But the new interface has its own property, IReadOnlyCollection<out T>.Count. See elsewhere on SO why these Count properties are distinct (instead of just one property).

The question:

Linq's method Enumerable.Count(this source) does check for ICollection<T>.Count, but it does not check for IReadOnlyCollection<out T>.Count.

Given that it is really natural and common to use Linq on read-only collections, would it be a good idea to change the BCL to check for both interfaces? I guess it would require one additional type check.

And would that be a breaking change (given that they did not "remember" to do this from the 4.5 version where the new interface was introduced)?

Sample code

Run the code:

    var x = new MyColl();
    if (x.Count() == 1000000000)
    {
    }

    var y = new MyOtherColl();
    if (y.Count() == 1000000000)
    {
    }

where MyColl is a type implementing IReadOnlyCollection<> but not ICollection<>, and where MyOtherColl is a type implementing ICollection<>. Specifically I used the simple/minimal classes:

class MyColl : IReadOnlyCollection<Guid>
{
  public int Count
  {
    get
    {
      Console.WriteLine("MyColl.Count called");
      // Just for testing, implementation irrelevant:
      return 0;
    }
  }

  public IEnumerator<Guid> GetEnumerator()
  {
    Console.WriteLine("MyColl.GetEnumerator called");
    // Just for testing, implementation irrelevant:
    return ((IReadOnlyCollection<Guid>)(new Guid[] { })).GetEnumerator();
  }

  System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  {
    Console.WriteLine("MyColl.System.Collections.IEnumerable.GetEnumerator called");
    return GetEnumerator();
  }
}
class MyOtherColl : ICollection<Guid>
{
  public int Count
  {
    get
    {
      Console.WriteLine("MyOtherColl.Count called");
      // Just for testing, implementation irrelevant:
      return 0;
    }
  }

  public bool IsReadOnly
  {
    get
    {
      return true;
    }
  }

  public IEnumerator<Guid> GetEnumerator()
  {
    Console.WriteLine("MyOtherColl.GetEnumerator called");
    // Just for testing, implementation irrelevant:
    return ((IReadOnlyCollection<Guid>)(new Guid[] { })).GetEnumerator();
  }

  System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  {
    Console.WriteLine("MyOtherColl.System.Collections.IEnumerable.GetEnumerator called");
    return GetEnumerator();
  }

  public bool Contains(Guid item) { throw new NotImplementedException(); }
  public void CopyTo(Guid[] array, int arrayIndex) { throw new NotImplementedException(); }
  public bool Remove(Guid item) { throw new NotSupportedException(); }
  public void Add(Guid item) { throw new NotSupportedException(); }
  public void Clear() { throw new NotSupportedException(); }
}

and got the output:

MyColl.GetEnumerator called
MyOtherColl.Count called

from the code run, which shows that the "shortcut" was not used in the first case (IReadOnlyCollection<out T>). Same result is seen in 4.5 and 4.5.1.


UPDATE after comment elsewhere on Stack Overflow by user supercat.

Linq was introduced in .NET 3.5 (2008), of course, and the IReadOnlyCollection<> was introduced only in .NET 4.5 (2012). However, in between, another feature, covariance in generics was introduced, in .NET 4.0 (2010). As I said above, IEnumerable<out T> became a covariant interface. But ICollection<T> stayed invariant in T (since it contains members like void Add(T item);).

Already in 2010 (.NET 4) this had the consequence that if Linq's Count extension method was used on a source of compile-time type IEnumerable<Animal> where the actual run-time type was for example List<Cat>, say, which is surely an IEnumerable<Cat> but also, by covariance, an IEnumerable<Animal>, then the "shortcut" was not used. The Count extension method checks only if the run-time type is an ICollection<Animal>, which it is not (no covariance). It can't check for ICollection<Cat> (how would it know what a Cat is, its TSource parameter equals Animal?).

Let me give an example:

static void ProcessAnimals(IEnuemrable<Animal> animals)
{
    int count = animals.Count();  // Linq extension Enumerable.Count<Animal>(animals)
    // ...
}

then:

List<Animal> li1 = GetSome_HUGE_ListOfAnimals();
ProcessAnimals(li1);  // fine, will use shortcut to ICollection<Animal>.Count property

List<Cat> li2 = GetSome_HUGE_ListOfCats();
ProcessAnimals(li2);  // works, but inoptimal, will iterate through entire List<> to find count

My suggested check for IReadOnlyCollection<out T> would "repair" this issue too, since that is one covariant interface which is implemented by List<T>.

Conclusion:

  1. Also checking for IReadOnlyCollection<TSource> would be beneficial in cases where the run-time type of source implements IReadOnlyCollection<> but not ICollection<> because the underlying collection class insists on being a read-only collection type and therefore wishes to not implement ICollection<>.
  2. (new) Also checking for IReadOnlyCollection<TSource> is beneficial even when the type of source is both ICollection<> and IReadOnlyCollection<>, if generic covariance applies. Specifically, the IEnumerable<TSource> may really be an ICollection<SomeSpecializedSourceClass> where SomeSpecializedSourceClass is convertible by reference conversion to TSource. ICollection<> is not covariant. However, the check for IReadOnlyCollection<TSource> will work by covariance; any IReadOnlyCollection<SomeSpecializedSourceClass> is also an IReadOnlyCollection<TSource>, and the shortcut will be utilized.
  3. The cost is one additional run-time type check per call to Linq's Count method.
Community
  • 1
  • 1
Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
  • 1
    It seems that the check for IReadOnlyCollection is just missing in the official Count() function. They probably forgot to implement it? But its optimization after all. – BlueM Apr 08 '14 at 14:54
  • @BlueM So my question was: Would it not be a good idea to check for `IReadOnlyCollection<>` (also)? – Jeppe Stig Nielsen Apr 08 '14 at 15:27
  • 1
    I mean you have to deal with one additional type check for **every** Count() call on an IEnumerable. It may not be wise if IReadOnlyInterface is more rare. – BlueM Apr 08 '14 at 15:29
  • 2
    @JeppeStigNielsen The discussion is whether there is any gain versus the cost of checking for the common cases. Optimisations are almost always a trade off. In this case it is trading extra time checking for and casting types to avoid iterating. I would surmise the common scenario is the number of regular lists / collections far outweighs the number of read-only counter-parts. In certain cases I have implemented `ICollection` on legacy custom lists just to take advantage of the LINQ optimisations in this area. – Adam Houldsworth Apr 08 '14 at 15:31
  • @AdamHouldsworth True, but today I became aware that this also has benefits because of covariance in generics. For example when a `List` is typed at compile-time as an `IEnumerable` (so `Animal` is a base class of `Cat`), the `Count` extension won't find the "shortcut". Checking for a covariant interface like `IReadOnlyCollection` would help with that. Question updated. – Jeppe Stig Nielsen Apr 30 '14 at 15:18
  • @JeppeStigNielsen: How many classes implement `IReadOnlyCollection` but not the non-generic `ICollection`? I would think the latter would be a better one to look for, though perhaps what should have happened would have been to create a non-generic `ICountable` with a `Count` property, and have all the other collection types inherit from that. I think implementations of other collection types' `Count` property would be regarded as implementations of `ICountable.Count`, so the change could have been compatible with existing code. – supercat Apr 30 '14 at 15:56
  • @supercat I guess user-defined classes will often implement the relevant generic interface only. Who bothers create a whole lot of explicit interface implementations to please a more or less obsolete non-generic interface that won't be used anyway? – Jeppe Stig Nielsen Apr 30 '14 at 16:38
  • 2
    @JeppeStigNielsen: Finding out the number of items in a collection should be a type-agnostic operation, and the non-generic `Collection` is the only type-agnostic interface providing that ability which many collections would be able to implement. It would have been better for there to have been a non-generic `ICountable` with a `Count` property, and to have `IReadableCollection` inherit from that and `IEnumerable`, but that's not how MS did things. – supercat Apr 30 '14 at 17:53

2 Answers2

3

In many cases a class that implements IReadOnlyCollection<T> will also implement ICollection<T>. So you will still profit from the Count property shortcut.

See ReadOnlyCollection for example.

public class ReadOnlyCollection<T> : IList<T>, 
    ICollection<T>, IList, ICollection, IReadOnlyList<T>, IReadOnlyCollection<T>, 
    IEnumerable<T>, IEnumerable

Since its bad practice to check for other interfaces to get access beyond the given readonly interface it should be ok this way.

Implementing an additional type check for IReadOnlyInterface<T> in Count() will be additional ballast for every call on an object which doesn't implement IReadOnlyInterface<T>.

BlueM
  • 6,523
  • 1
  • 25
  • 30
  • Today I became aware of another benefit, namely if the above is a `ReadOnlyCollection` which is typed as an `IEnumerable` by covariance (`Animal` is a base class of `Cat`), then the check for `ICollection` will actually return "false". But because of covariance a check for `IReadOnlyCollection` will return "true" since, by covariance, an `IReadOnlyCollection` is an `IReadOnlyCollection`. See updated question (question is now too long ...). – Jeppe Stig Nielsen Apr 30 '14 at 15:15
2

Based on the MSDN documentation, ICollection<T> is the only type that gets this special treatment:

If the type of source implements ICollection<T>, that implementation is used to obtain the count of elements. Otherwise, this method determines the count.

I'm guessing they didn't see it as worthwhile to mess with the LINQ codebase (and its spec) for the sake of this optimization. There are lots of CLR types that have their own Count property, but LINQ can't account for all of them.

JLRishe
  • 99,490
  • 19
  • 131
  • 169