Why is randomised quicksort considered better than standard quicksort?

Question

In Cormen's own words - "The difference is that with the deterministic algorithm, a particular input can elicit that worst-case behavior. With the randomized algorithm, however, no input can always elicit the worst-case behavior."

How does adding a randomized pivot change anything, the algorithm is still gonna perform bad under some particular input and considering each kind of input equally likely this is no better than that of the standard quicksort, only difference being we don't actually know which particular input is going to cause the worst case time complexity. So why is the randomized version considered better?

think less "accidentally bad" and more "I'm being attacked" bad — Mooing Duck, May 04 '21 at 19:01
@MooingDuck so then thats the only advantage? and would it really matter in the real world though, would making quicksort (or any sorting algo) run on worst complexity be of use in a potential hack? — Ayush, May 09 '21 at 14:22
In fact, that is so important that it has it's own name: https://en.wikipedia.org/wiki/Denial-of-service_attack#Application-layer_attacks — Mooing Duck, May 10 '21 at 15:37

score 3 · Answer 1 · answered May 05 '21 at 04:49

Consider the following version of quicksort, where we always pick the last element as the pivot. Now consider the following array:

int[] arr = {9, 8, 7, 6, 5, 4, 3, 2, 1};

When this array is sorted using our version of quicksort, it will always pick the smallest element as its pivot, the last element. And in the first iteration, it will change the array like this:

arr = [1, 8, 7, 6, 5, 4, 3, 2, 9];

Now, it will recurse on the sub-arrays:

s1 = [1, 8, 7, 6, 5, 4, 3, 2];
s2 = [9];

In s1 it will again pick 2 as its pivot, and only 8 and 2 will interchange positions. So, in this way, if we try to formulate a recurrence relation, for its complexity, it will be

T(n) = T(n-1) + O(n)

which corresponds to O(n^2).

So, for this array, the standard version will always take O(n^2) time.

In the randomized version, we first exchange the last element with some random element in the array and then select it as the pivot. So, for the given array, this pivot will split the array randomly, most probably in the middle. So, now the recurrence will be

T(n) = 2T(n/2) + O(n)

which will be O(n * Log(n)).

That's why we consider randomized quicksort better than standard quicksort, because, there is very low probability of bad splits in randomized quicksort.

The fact that sorted sequences (which are common in data) is a worst case scenario for quicksort was solved long ago by picking the median element as a pivot. This makes a bad case just as unlikely with normal data sources as randomized. The real reason for the switch is to avoid bad data from attackers. — btilly, May 05 '21 at 15:48
Charachit, thanks for that, but then again, my question is about why we think randomized quick sort is better (only difference here is that we won't know for which input the worst case is occurring, there still exists cases were it happens, same as the standard version) So its technically no better according to me — Ayush, May 09 '21 at 14:20

score 0 · Answer 2 · answered May 05 '21 at 18:08

The difference is that with the deterministic algorithm, a particular input can elicit that worst-case behavior. With the randomized algorithm, however, no input can always elicit the worst-case behavior.

This should be clarified to mean a truly randomized algorithm. If instead a deterministic pseudo-random algorithm is used, then a deliberately created input can elicit worst case behavior.

With the randomized algorithm, however, no input can always elicit the worst-case behavior.

This should be clarified: even with a truly randomized algorithm, there is still the possibility of some specific input that could elicit worst-case behavior in one or more invocations of a randomized quicksort with that input, but no input could always elicit worst-case behavior for an infinite number of invocations of a truly randomized quicksort on that same input.

Most library implementations of single pivot quicksort use a median of 3 or median of 9, since they can't rely on having fast instructions for random numbers like X86 RRAND and fast divide (for modulo function). If a quicksort was somehow part of an encryption scheme, then a truly randomized algorithm could be used to avoid time based attacks.

Why is randomised quicksort considered better than standard quicksort?

2 Answers2