0

I was wondering about the complexity of quicksort implemented in this post.

Stuart Marks says it's O(N^2 log N). But is it really? I don't understand these words:

It seems to me -- and once again, I'm not a C# or .NET expert -- that this will cause certain innocuous-looking calls, such as pivot selection via ints.First(), to be more expensive than they look. At the first level, of course, it's O(1). But consider a partition deep in the tree, at the right-hand edge. To compute the first element of this partition, the entire source has to be traversed, an O(N) operation. But since the partitions above are lazy, they must be recomputed, requiring O(lg N) comparisons. So selecting the pivot would be an O(N lg N) operation, which is as expensive as an entire sort.

Why would ints.First() be an O(N) operation? I think it's always O(1). And why do the partitions above in the tree of IEnumerables have to be recomputed? This also doesn't make any sense to me. Doesn't IEnumerable.Where return a new IEnumerable? Seems to me like the time complexity of this algorithm is still O(N log N), but the space complexity is O(N log N) as well, instead of just O(N) that we have where we sort in-place.

All in all is Stuart Marks right or am I right?

Coder-Man
  • 2,391
  • 3
  • 11
  • 19

1 Answers1

3

The IEnumerable<> doesn't cache. If it is backed by a collection (like new int[5].AsEnumerable()) then you can reuse it how many times you want, but a IEnumerable<> in theory could be generated piecemail, one element at a time, and in memory you'll have only the current element, and the previous elements are forgotten. There is no guarantee that enumerating twice an IEnumerable<> will return the same data, nor that it will be possible to enumerate it twice. The question you linked is quite stupid and shows that the poster didn't know what he was speaking about.

The QuickSort(IEnumerable<int> ints) proposed has a parameter IEnumerable<int> ints. The method doesn't have any external guarantee that the IEnumerable<int> ints can be enumerated twice, or that accessing it even once won't cause a O(N) operation.

Now... .First() could be a O(N) operation or even worse, if, for example the backing collection must be ordered... If you QuickSort(new[] { 5, 4, 3, 2, 1}.OrderBy(x => x)), then pars.First() upon execution will need to wait for the OrderBy() to be executed, and the OrderBy() must first look at the whole backing IEnumerable<> (the new[] { }) to sort it (so at least O(N))

A "fun" example of First() that is O(N) on a IEnumerable<> that will give different results each time it is executed.

private static int seed = 0;
public static IEnumerable<int> GetSomeInts()
{
    var rnd = new Random(seed++);

    for (int i = 0; i < 10; i++)
    {
        Console.Write(".");
        yield return rnd.Next(100000);
    }
}

for (int i = 0; i < 10; i++)
{
    Console.WriteLine(GetSomeInts().OrderBy(x => x).First());
}

You can see the O(N) from the number of "." printed. Try removing the OrderBy() and observe the result. About the fact that the IEnumerable<> will return different results every time it is executed... Well... There is a for cycle :-) Try looking at the results.

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • How can the ```.First()``` be O(N)? A retrieval task from an array is O(1), since "you know where to look". Or does the ```.First()``` do additional operations? – H.J. Meijer May 29 '18 at 07:39
  • But the poster of that question just uses ints.First(), he doesn't sort or anything... Can you please explain it more clearly? – Coder-Man May 29 '18 at 07:41
  • @H.J.Meijer The `First()` is O(1), but the backing `IEnumerable<>` could be O(N). The total cot of `First()` is then `O(N)` – xanatos May 29 '18 at 07:41
  • @POrekhov The `IEnumerable ints` parameter is opaque. You don't know what it is... `QuickSort(new[] { 5, 4, 3, 2, 1}.OrderBy(x => x))` is perfectly legal. – xanatos May 29 '18 at 07:42
  • @xanatos oh, so, if we have {1,2,3,4,5}, the `Where` operation only takes the "view" of that enumerable, and doesn't copy anything? – Coder-Man May 29 '18 at 07:44
  • @POrekhov Yes... `IEnumerable<>` is a forward-only streaming interface, that sometimes can be rewinded – xanatos May 29 '18 at 07:45
  • @xanatos oh, that's why, makes sense. – Coder-Man May 29 '18 at 07:45
  • @POrekhov Remember: there is an absolute rule for `IEnumerable<>`: **don't enumerate them twice!** `.Count()`, `.Any()`, `.First()` all count as enumerating. If you need to do it, first `.ToArray()`, then work on the array. – xanatos May 29 '18 at 07:47
  • I see, thanks, btw the Java Streams API prohibits reusing streams. (That's what the question in that post is about.) – Coder-Man May 29 '18 at 07:49
  • @POrekhov And here instead you don't have a guarantee that you can reuse a stream... I don't see a big difference. One liner that can't be enumerated twice: `private static bool alreadyEnumerated = false; public static IEnumerable GetSomeInts2() { yield return alreadyEnumerated ? throw new Exception() : (alreadyEnumerated = true) == alreadyEnumerated ? 1 : 1; }` (will throw exception the second time) – xanatos May 29 '18 at 07:52
  • If you look at the response of the linked article, you'll see that in Java they decided to split what in C# is a `IEnumerable<>` between `Iterator` and `Stream`, because they wanted to make it clear what is enumerable multiple times and what isn't. They made this decision *after* the introduction of LINQ in .NET (so after observing the confusion that LINQ caused between "can be iterated multiple times" and "can't be iterated multiple times") – xanatos May 29 '18 at 07:58
  • @xanatos, okay, all this applies to IEnumerables that aren't backed by a collection, how about an IEnumerable that **is** backed by a collection, what then? Will the sorting complexity be O(N log N)? – Coder-Man May 29 '18 at 08:03
  • @POrekhov Consider that the complexity of QuickSort is O(N LogN) IF you can access elements directly (random access), so with an access complexity of O(1). By using `IEnumerable<>`, to access the N element you must go through N-1 elements, so the complexity of accessing an element is O(N). The total complexity of this QuickSort is then O(N) * O(N LogN) = O(N^2 LogN) – xanatos May 29 '18 at 08:11
  • @xanatos ooooh, so, it's the `First` method that does that, I get it, thanks! – Coder-Man May 29 '18 at 08:13
  • 1
    Or more like, when you call `First()`, the `Where()` method has to work first and then the `First()` method returns the first element, right? I just looked at https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs – Coder-Man May 29 '18 at 08:19
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/171974/discussion-between-xanatos-and-porekhov). – xanatos May 29 '18 at 08:20