It's worth noting from the beginning that the fact that a given method is only documented as operating on IEnumerable<T>
does not mean that it isn't optimised for given implementations or derived interfaces. In fact a great many of the methods in Enumerable
take different paths for different derived interfaces and/or concrete implementations. The classic example here is that Count()
takes a different path if the IEnumerable<T>
it is called on implements ICollection<T>
or ICollection
. There are several further examples of this in the full framework, and even more in .NET Core, including some which take optimised paths for the implementation of IOrderedEnumerable<T>
you get from calling OrderBy()
.
Some of which are my doing, because my hobby these days is contributing to .NET Core, particularly to Linq, and particularly performance improvements (though obviously if I'm hacking on something I need to increase tests on the bits I'm touching, and when doing so turns up minor bugs they get prioritised over performance improvements).
When it comes to IOrderedEnumerable
, I've done things like change .OrderBy(someLambda).Skip(j).Take(k)
(common paging idiom) from O(n log n) time to compute and O(j + k) time to enumerate to O(n + k log k) time to compute and O(k) time to enumerate, and .OrderBy(someLambda).First()
for O(n) space and O(n log n) time to O(1) space and O(n) time, and so on.
I might look at improving other methods, and of course if I don't it's quite possible someone else would.
If I do, I would not do it as you suggest.
Firstly, to have a separate overload for IOrderedEnumerable<T>
would require adding a method to the public API but only cover some cases (maybe what we're given as an IEnumerable<T>
is in fact an IOrderedEnumerable<T>
). Much better to just have an overload for IEnumerable<T>
and detect the IOrderedEnumerable<T>
case.
Secondly to use binary search the we would have to know the means by which the IOrderedEnumerable
was sorted. This is possible with the OrderedEnumerable<TElement, TKey>
created by calls of OrderBy
but not more generally.
Thirdly, it would not be the biggest possible gain.
The current costs of source.OrderBy(someLambda).Contains(someItem)
are as follows:
- Buffer
source
: O(n) space, O(n) time.
- Sort the buffer: O(n log n) time (average, O(n²) worse).
- Find the an item that matches
someItem
, or confirm none exists.: O(n) time.
If Contains()
was optimsed to use binary search it would become:
- Buffer
source
: O(n) space, O(n) time.
- Sort the buffer: O(n log n) time (average, O(n²) worse).
- Find the an item that matches
someItem
, or confirm none exists.: O(log n) time (average, O(n) worse because a precise match may sort at the same level as all elements and have to be compared with all of them).
However, that's a complete waste. If we want to optimise Contains()
(and a great many other aggregate methods for that matter) the optimum stragey would be:
- Call
source.Contains(someItem)
and return the result. This will at worse be O(n) time and O(1) space, though it may be O(log n) or O(1) time if source
is for example a HashSet<T>
(a case that Contains()
is already optimised for). In both theory and practice it will end up being faster than the buffering step above.
Implementing that change would be considerably less work, and a much bigger gain.
I've considered this, and might indeed submit such a PR, but I'm not yet sure if on balance it's worth it (and hence what my opinion would be if someone else submits such a PR) since it's almost always easier for the the caller to turn ….OrderBy(foo).Contains(bar)
into .Contains(bar)
themselves, and the check needed by optimising for such a case would be cheap, but not entirely free.