Why is QuackSort 2x faster than Data.List's sort for random lists?

Question

I was looking for the canonical implementation of MergeSort on Haskell to port to HOVM, and I found this StackOverflow answer. When porting the algorithm, I realized something looked silly: the algorithm has a "halve" function that does nothing but split a list in two, using half of the length, before recursing and merging. So I thought: why not make a better use of this pass, and use a pivot, to make each half respectively smaller and bigger than that pivot? That would increase the odds that recursive merge calls are applied to already-sorted lists, which might speed up the algorithm!

I've done this change, resulting in the following code:

import Data.List
import Data.Word

randomList :: Word32 -> Word32 -> [Word32]
randomList seed 0    = []
randomList seed size = seed : randomList (seed * 1664525 + 1013904223) (size - 1)

quacksort :: [Word32] -> [Word32]
quacksort []           = []
quacksort [x]          = [x]
quacksort (p : x : xs) = split p (p : x : xs) [] [] where

  -- Splits the list in two halves of elements smaller/bigger than a pivot
  split p []       as bs = merge (quacksort as) (quacksort bs)
  split p (x : xs) as bs = quack p (p < x) x xs as bs

  -- Helper function for `split`
  quack p False x xs as bs = split p xs (x : as) bs
  quack p True  x xs as bs = split p xs as (x : bs)

  -- Merges two lists as a sorted one
  merge []       ys       = ys
  merge xs       []       = xs
  merge (x : xs) (y : ys) = place (x < y) x xs y ys

  -- Helper function for `merge`
  place False x xs y ys = y : merge (x : xs) ys
  place True  x xs y ys = x : merge xs (y : ys)

main :: IO ()
main = do
  let l = randomList 0 2000000
  let b = quacksort l
  print $ sum b

I then benchmarked it and, to my surprise, it was, indeed, 2x faster than Haskell's official Data.List sort. So I wondered why this isn't used in practice, and, suddenly, I realized the obvious: mergesort does NOT perform better on already sorted lists. D'oh. So the whole assumption behind quacksort was failed. Not only that, it would perform terribly for reversely sorted lists, since it would fail to produce two halves of similar size (except if we could guess a really good pivot). So, quacksort is wack in all cases and should never be used in practice. But, then...

Why the hell does it perform 2x faster than Data.List's sort for random lists?

I can't think of a good reason this should be the case. Making each half smaller/bigger than a pivot doesn't change how many times the merge call must be called, so it shouldn't have any positive effect. But reverting it back to a conventional mergesort does make it 2x slower, so, for some reason, the ordered split helps.

`Data.List` uses the idea of TimSort that makes the assumption that most lists are partly sorted. — Willem Van Onsem, Jan 25 '22 at 23:17
@WillemVanOnsem interesting, I didn't know that. Most lists sorted in practice are indeed probably partly sorted. — MaiaVictor, Jan 25 '22 at 23:22
Perhaps there's some specialization going on. Dictionary lookups can be a killer. Is `quacksort` still faster if you make it polymorphic and put it in a different module? — Daniel Wagner, Jan 26 '22 at 03:41
@DanielWagner the thing is, if you split the list in two halves without using a pivot, it will be 2x slower than itself. So for some reason splitting smaller/bigger elements is making a difference, and that's the mystery. — MaiaVictor, Jan 26 '22 at 04:19
This is pretty interesting! I managed to double the speed of quacksort and I have some ideas about the performance reasons. I'll post an answer after I sufficiently explored this. — András Kovács, Jan 26 '22 at 11:15
`quacksort` is an obfuscated and mildly inefficient unstable quick sort. Your arguments to `merge` are always fully partitioned so that it's doing the same thing as `(++)` would, except with extra comparisons thrown in to slow it down. As with any naive-pivot quick sort, it has O(n²) edge cases - and they're going to be the same sorts of cases in which `Data.List.sort` is O(n). It might be faster on random data, but it would likely be a regression in many real-world applications. And the loss of stability makes it unsuitable for some cases involving sorting in multiple steps. — Carl, Jan 26 '22 at 20:08
@Carl you're correct. I see it now. So this was just a derpy quicksort all along. I guess I couldn't have picked a better name. — MaiaVictor, Jan 26 '22 at 21:53
you can find stable quicksort version e.g. at the bottom of [this answer](https://stackoverflow.com/a/11373542/849891) of mine. although it also does three-way partitioning into the smaller the bigger and the equal elements which you don't need here. — Will Ness, Jan 27 '22 at 10:59
Note that the original merge sort and the merge sort used in nearly all libraries is some variation of a [bottom up merge sort](https://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation), typically a hybrid of insertion sort to create small runs, then using merge sort on those runs, such as [timsort](https://en.wikipedia.org/wiki/Timsort) . The top down merge sort you refer to in your question is mostly used for academic purposes, not for actual implementations . — rcgldr, Jan 27 '22 at 17:20
@rcgldr just to clarify, the last time I looked at Haskell library `sort` code (a year or so ago, maybe more) it was _detecting_ the runs, not creating them. — Will Ness, Jan 27 '22 at 19:11
BTW I just noticed the Q says **"mergesort does NOT perform better on already sorted lists"**. this is *not true* for the built-in `sort`, because it detects the ordered runs in the input in *O(n)* time, and when there's only one run detected it is just returned as the result, for the total *O(n)* time. this works for descending lists as well as ascending. — Will Ness, Jan 27 '22 at 19:21
@WillNess it is true for the naive mergesort, though? Isn't the built in sort actually a functional timsort? — MaiaVictor, Jan 27 '22 at 19:37
I don't believe so, no. Timsort stops the partitioning at some small number of items and does insertion sort for those chunks AFAIR. the library sort doesn't do that at all - last time I checked, which was some year+ go. it goes along the input list detecting and collecting the ordered runs. then it proceeds with the pairwise mergings starting from those detected chunks. the naive bottom-up mergesort just breaks the list into singletons instead, and then goes on merging them, in pairwise manner, until only one chunk is left. I'll include a link to a prototypical code shortly. — Will Ness, Jan 27 '22 at 21:45
[here it is](https://en.wikipedia.org/wiki/Haskell_features#Mergesort). so yes, the naive one doesn't care whether the input is sorted or not -- it will perform the same number of comparisons in either case, in *O(n log n)* time. ---- so yeah, I've checked [the source](https://hackage.haskell.org/package/base-4.16.0.0/docs/src/Data.OldList.html#sort) just now and it still does that. — Will Ness, Jan 27 '22 at 21:52
reading WP [page on timsort](https://en.wikipedia.org/wiki/Timsort), it does look similar. except, in the library version there's no minimal run size requirement and hence no insertion sort (like that page describes). so the library sort is much simpler, what's described at that page is much more sophisticated. timsort "strives to perform balanced merges (a merge thus merges runs of similar sizes)." the built-in just does them pairwise, be what may. — Will Ness, Jan 27 '22 at 22:03
@WillNess - what about Java's implementation of timsort for an array of objects? — rcgldr, Jan 29 '22 at 17:01

Will Ness · Accepted Answer · 2022-01-27T19:15:08.043

2

Your split splits the list in two ordered halves, so merge consumes its first argument first and then just produces the second half in full. In other words it is equivalent to ++, doing redundant comparisons on the first half which always turn out to be True.

In the true mergesort the merge actually does twice the work on random data because the two parts are not ordered.

The split though spends some work on the partitioning whereas an online bottom-up mergesort would spend no work there at all. But the built-in sort tries to detect ordered runs in the input, and apparently that extra work is not negligible.

edited Jan 27 '22 at 19:15

answered Jan 27 '22 at 10:46

Will Ness

70,110
9
98
181

Data.List.sort is not usual merge sort. First it finds the ordered runs in the input, resulting in a list of ordered sublists. Then it recursively merges them pairwise until there is only one remaining. That makes a lot of sense when sorting a linked list, but wouldn't make sense for sorting an array. – phadej Feb 03 '22 at 11:26
@phadej I believe I didn't say anything different in my answer (and also, [comments](https://stackoverflow.com/questions/70856865/why-is-quacksort-2x-faster-than-data-lists-sort-for-random-lists/70877215?noredirect=1#comment125315548_70856865) above). if you reacted to the "true" in my "true mergesort" phrase, I meant it in comparison with the OP's version, not in the "true Sieve" kind of way. and yes, the question (and answer) explicitly talk about lists, not arrays. – Will Ness Feb 03 '22 at 12:13

Why is QuackSort 2x faster than Data.List's sort for random lists?

1 Answers1