0

According to here

Use insertion sort...for invocations on small arrays (i.e. where the length is less than a threshold k determined experimentally). This can be implemented by simply stopping the recursion when less than k elements are left, leaving the entire array k-sorted: each element will be at most k positions away from its final position. Then, a single insertion sort pass finishes the sort in O(k×n) time.

I'm not sure I'm understanding correctly. One way to do it that involves calling insertion sort multiple times is

quicksort(A, i, k):
  if i+threshold < k:
    p := partition(A, i, k)
    quicksort(A, i, p - 1)
    quicksort(A, p + 1, k)
  else
    insertionsort(A, i, k)

but this would call insertionsort() for each subarray. It sounds like insertion sort could be called only once, but I sort of don't understand this because it doesn't matter how many times insertion sort is called it's still generally slower than quicksort.

Is the idea like this?

sort(A)
 quicksort(A, 0, A.length-1)
 insertionsort(A, 0, A.length-1)

So basically call insertion sort once at the very end? How do you know it would only take one pass and not run at O(n)?

rici
  • 234,347
  • 28
  • 237
  • 341
Celeritas
  • 14,489
  • 36
  • 113
  • 194

1 Answers1

1

Yes, your second pseudocode is correct. The usual analysis of insertion sort is that the outer loop inserts each element in turn (O(n) iterations), and the inner loop moves that element to its correct place (O(n) iterations), for a total of O(n^2). However, since your second quicksort leaves an array that can be sorted by permuting elements within blocks of size at most threshold, each element moves at most threshold positions, and the new analysis is O(n*threshold), which is equivalent to running insertion sort on each block separately.

Described by Bentley in the 1999 edition of Programming Pearls, this idea (per Wikipedia) avoids the overhead of starting and stopping the insertion sort loop many times (in essence, we have a natural sentinel value for the insertion loop already in the array). IMHO, it's a cute idea but not clearly still a good one given how different the performance characteristics of commodity hardware are now (specifically, the final insertion sort requires another pass over the data, which has gotten relatively more expensive, and the cost of starting the loop (moving some values in registers) has gotten relatively less expensive).

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • Care to expand on which performance characteristics you're referring to? Certainly, it makes more sense to insertion sort the elements which happen to be in the cache because they've just been partitioned, but is there something else you're thinking of? – rici Aug 27 '14 at 23:35
  • @rici That, and the cost of shuffling a couple values around in registers has become relatively less expensive compared to memory references. – David Eisenstat Aug 27 '14 at 23:36
  • 1
    On the other hand, the cost of missed branch prediction has become a lot more expensive. The end of loop test in the insertion test is likely to be predicted wrong every time the loop ends; in the single insertionsort version, that only happens once. Of course, without a benchmark, it's all just idle speculation. – rici Aug 27 '14 at 23:38
  • @rici Yeah, who knows? I phrased my criticism very gently for that reason. – David Eisenstat Aug 27 '14 at 23:39
  • @DavidEisenstat you just copied my question. How is your pseudocode any different than mine? What you call quicksort-coarse I called sort. – Celeritas Aug 27 '14 at 23:56
  • @Celeritas The second pseudocode was not in a code block when I wrote the answer. – David Eisenstat Aug 27 '14 at 23:57
  • Using VisualVM to benchmark it appears with this "optimization" it's slower (at least on my computer, with randomly generated arrays). – Celeritas Aug 28 '14 at 00:18
  • @Celeritas Thanks for doing the experiment. – David Eisenstat Aug 28 '14 at 00:20
  • This must use an insertion sort which inserts at the end; so you step along the array, moving values upwards as required. If "inserting" item 'i' requires it to be moved up (for example, it's less than item 'i-1', for ascending order), I wonder if it's worth stepping back by k/2 or k/4 and doing a binary chop or two ? If you do the insertion sort of each partion at the bottom of the recursion, you know exactly how many items are in the fully sorted part of the partition, so you could unroll some or all of the sort at that point. The benefit rather depends on the cost of the comparison. –  Aug 28 '14 at 13:17