There's a great answer already, but to add a few things:
Enumerating the results of OrderBy() obviously can't yield an element until it has processed all elements because not until it has seen the last input element can it know that that last element seen isn't the first it must yield. It also must work on sources that can't be repeated or which will give different results each time. As such even if some sort of zeal meant the developers wanted to find the nth element anew each cycle, buffering is a logical requirement.
The quicksort is lazy in two regards though. One is that rather than sort the elements to return based on the keys from the delegate passed to the method, it sorts a mapping:
- Buffer all the elements.
- Get the keys. Note that this means the delegate is run only once per element. Among other things it means that non-pure keyselectors won't cause problems.
- Get a map of numbers from 0 to n.
- Sort the map.
- Enumerate through the map, yielding the associated element each time.
So there is a sort of laziness in the final sorting of elements. This is significant in cases where moving elements is expensive (large value types).
There is of course also laziness in that none of the above is done until after the first attempt to enumerate, so until you call MoveNext()
the first time, it won't have happened.
In .NET Core there is further laziness building on that, depending on what you then do with the results of OrderBy
. Since OrderBy
contains information about how to sort rather than the sorted buffer, the class returned by OrderBy
can do something else with that information other than quicksorting:
- The most obvious is
ThenBy
which all implementations do. When you call ThenBy
or ThenByDescending
you get a new similar class with different information about how to sort, and the sort the OrderBy
result could have done probably never will.
First()
and Last()
don't need to sort at all. Logically source.OrderBy(del).First()
is a variant of source.Min()
where del
contains the information to determine what defines "less than" for that Min()
. Therefore if you call First()
on the results of an OrderBy()
that's exactly what is done. The laziness of OrderBy
allows it to do this instead of quicksort. (Which means O(n) time complexity and O(1) space complexity instead of O(n log n) and O(n) respectively).
Skip()
and Take()
define a subsequence of a sequence which with OrderBy
must conceptually happen after that sort. But since they are lazy too what can be returned is an object that knows; how to sort, how many to skip, how many to take. As such partial quicksort can be used so that the source need only be partially sorted: If a partition is outside of the range that will be returned then there's no point sorting it.
ElementAt()
places more of a burden than First()
or Last()
but again doesn't require a full quicksort. Quickselect can be used to find just one result; if you're looking for the 3rd element and you've partitioned a set of 200 elements around the 90th element then you only need to look further in the first partition and can ignore the second partition from now on. Best-case and average-case time complexity is O(n).
- The above can be combined, so e.g.
.Skip(10).First()
is equivalent to ElementAt(10)
and can be treated as such.
All of these exceptions to getting the entire buffer and sorting it all have one thing in common: They were all implemented after identifying a way in which the correct result can be returned after making the computer do less work*. That new [] {1, 2, 3, 4}.Where(i => i % 2 == 0)
will yield the 2
before it has seen the 4
(or even the 3
it won't yield comes from the same general principle. It just comes at it more easily (though there are still specialised variants of Where()
results behind the scenes to provide other optimisations).
But note that Enumerable.Range(1, 10000).Where(i => i >= 10000)
scans through 9999 elements to yield that first. Really it's not all that different to OrderBy
's buffering; they're both bringing you the next result as quickly as they can†, and what differs is just what that means.
*And also identifying that the effort to detect and make use of the features of a particular case are worth it. E.g. many aggregate calls like Sum()
can be optimised of the results of OrderBy
by skipping the ordering completely. But this can generally be realised by the caller and they can just leave out the OrderBy
so while adding that would make most calls to Sum()
slightly slower to make that case much faster the case that benefits shouldn't really be happening anyway.
†Well, pretty much as quickly. It would be possible to get the first results back more quickly than OrderBy
does—when you've got the left most part of a sequence sorted start giving out results—but that comes at a cost that would affect the later results so the trade-off isn't necessarily that doing that would be better.