kth smallest number - quicksort faster than quickselect

Question

I have implemented the following quickselect algorithm to achieve O(n) complexity for median selection (more generally kth smallest number):

static size_t partition(struct point **points_ptr, size_t points_size, size_t pivot_idx)
{
    const double pivot_value = points_ptr[pivot_idx]->distance;

    /* Move pivot to the end. */
    SWAP(points_ptr[pivot_idx], points_ptr[points_size - 1], struct point *);

    /* Perform the element moving. */
    size_t border_idx = 0;
    for (size_t i = 0; i < points_size - 1; ++i) {
            if (points_ptr[i]->distance < pivot_value) {
                    SWAP(points_ptr[border_idx], points_ptr[i], struct point *);
                    border_idx++;
            }
    }

    /* Move pivot to act as a border element. */
    SWAP(points_ptr[border_idx], points_ptr[points_size - 1], struct point *);

    return border_idx;
}

static struct point * qselect(struct point **points_ptr, size_t points_size, size_t k)
{
    const size_t pivot_idx = partition(points_ptr, points_size, rand() % points_size);

    if (k == pivot_idx) { //k lies on the same place as a pivot
            return points_ptr[pivot_idx];
    } else if (k < pivot_idx) { //k lies on the left of the pivot
            //points_ptr remains the same
            points_size = pivot_idx;
            //k remains the same
    } else { //k lies on the right of the pivot
            points_ptr += pivot_idx + 1;
            points_size -= pivot_idx + 1;
            k -= pivot_idx + 1;
    }

    return qselect(points_ptr, points_size, k);
}

Then I tried to compare it with a glibc's qsort() with O(nlog(n)) and was surprised by its superior performance. Here is the measurement code:

double wtime;
wtime = 0.0;
for (size_t i = 0; i < 1000; ++i) {
    qsort(points_ptr, points_size, sizeof (*points_ptr), compar_rand);
    wtime -= omp_get_wtime();
    qsort(points_ptr, points_size, sizeof (*points_ptr), compar_distance);
    wtime += omp_get_wtime();
}
printf("qsort took %f\n", wtime);

wtime = 0.0;
for (size_t i = 0; i < 1000; ++i) {
    qsort(points_ptr, points_size, sizeof (*points_ptr), compar_rand);
    wtime -= omp_get_wtime();
    qselect(points_ptr, points_size, points_size / 2);
    wtime += omp_get_wtime();
}
printf("qselect took %f\n", wtime);

with results similar to qsort took 0.280432, qselect took 8.516676 for an array of 10000 elements. Why is quicksort faster than quickselect?

O() indicates how resource usage *scales*, not how much resources (e.g. time) it *uses*. A algorithm that scales better could still take longer give a given input. — ikegami, Feb 06 '17 at 20:51
Note that quicksort is O(N^2), not O(N log N). That said, `qsort` may implement something other than quicksort. — ikegami, Feb 06 '17 at 20:55
@ikegami: Quicksort is `O(n^2)` *worst case*, `O(n logn)` average/best case. — EOF, Feb 06 '17 at 20:56
Exactly. It scales pretty badly. Worst, its worse performance is for sorted and nearly-sorted lists, IIRC. — ikegami, Feb 06 '17 at 20:59
Randomized quicksort has `O(n log n)` expected worst case. For all intents and purposes, it will never degenerate to `O(n^2)`. OP uses the same idea in his `quickselect`. — IVlad, Feb 06 '17 at 21:03
Can you try shuffling the array without calling `qsort`? It's likely not a proper shuffle anyway, and might mess with the cache. Use a simple Fisher-Yates shuffle instead: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle — IVlad, Feb 06 '17 at 21:07
Unfortunately, I don't know the answer but would suggest you have a look at qsort's source code. — MikeMB, Feb 06 '17 at 21:51
If you implement quicksort - will your implementation comparable with library function by speed? — MBo, Feb 07 '17 at 04:25
The difference is huge. It looks too big to explain by inefficient code, unless your SWAP macro is doing something truly horrible. Are you running the debug compile, or the fully optimized compile? — Matt Timmermans, Feb 07 '17 at 04:26

score 1 · Answer 1 · answered Feb 06 '17 at 22:54

The first obvious answer is: Maybe qsort does not implement quicksort. It has been some time since i read the standard, but i don't think that there is anything requiring that qsort() performs quicksort.

Second: Existing C standard libraries are often heavily optimized (eg using special assembly instructions where available). Combined with how complex performance characteristics of modern CPUs are, this may very well lead to a O(n log n) - which quicksort is not - algorithm being faster then a O(n) algorithm.

My guess would be that you are messing up the cache - something that valgrind / cachegrind sould be able to tell you.

Jan Wrona · Accepted Answer · 2017-02-08T13:06:59.313

Thanks for your suggestions guys, problem with my implementation of quickselect was that it exhibits its worst-case complexity O(n^2) for inputs that contain many repeated elements, which was my case. Glibc's qsort() (it uses mergesort by default) does not exhibit O(n^2) here.

I have modified my partition() function to perform a basic 3-way partitioning and the median-of-three which works nicely for quickselect:

/** \breif Quicksort's partition procedure.                                  
 *                                                                           
 * In linear time, partition a list into three parts: less than, greater than
 * and equals to the pivot, for example input 3 2 7 4 5 1 4 1 will be        
 * partitioned into 3 2 1 1 | 5 7 | 4 4 4 where 4 is the pivot.              
 * Modified version of the median-of-three strategy is implemented, it ends with
 * a median at the end of an array (this saves us one or two swaps).         
 */                                                                          
static void partition(struct point **points_ptr, size_t points_size,
                      size_t *less_size, size_t *equal_size)
{                                                                            
    /* Modified median-of-three and pivot selection. */                      
    struct point **first_ptr = points_ptr;                                   
    struct point **middle_ptr = points_ptr + (points_size / 2);              
    struct point **last_ptr = points_ptr + (points_size - 1);                
    if ((*first_ptr)->distance > (*last_ptr)->distance) {                    
        SWAP(*first_ptr, *last_ptr, struct point *);                         
    }                                                                        
    if ((*first_ptr)->distance > (*middle_ptr)->distance) {                  
        SWAP(*first_ptr, *middle_ptr, struct point *);                       
    }                                                                        
    if ((*last_ptr)->distance > (*middle_ptr)->distance) { //reversed        
        SWAP(*last_ptr, *middle_ptr, struct point *);                        
    }                                                                        
    const double pivot_value = (*last_ptr)->distance;                      

    /* Element swapping. */                                                  
    size_t greater_idx = 0;                                                  
    size_t equal_idx = points_size - 1;                                      
    size_t i = 0;                                                            
    while (i < equal_idx) {                                                  
        const double elem_value = points_ptr[i]->distance;                   

        if (elem_value < pivot_value) {                                      
            SWAP(points_ptr[greater_idx], points_ptr[i], struct point *);    
            greater_idx++;                                                   
            i++;                                                             
        } else if (elem_value == pivot_value) {                              
            equal_idx--;                                                     
            SWAP(points_ptr[i], points_ptr[equal_idx], struct point *);      
        } else { //elem_value > pivot_value                                  
            i++;                                                             
        }                                                                    
    }                                                                        

    *less_size = greater_idx;                                                
    *equal_size = points_size - equal_idx;                                   
}

/** A selection algorithm to find the kth smallest element in an unordered list.
 */                                                                          
static struct point * qselect(struct point **points_ptr, size_t points_size,
                              size_t k)
{                                                                            
    size_t less_size;                                                        
    size_t equal_size;                                                       

    partition(points_ptr, points_size, &less_size, &equal_size);             

    if (k < less_size) { //k lies in the less-than-pivot partition           
        points_size = less_size;                                             
    } else if (k < less_size + equal_size) { //k lies in the equals-to-pivot partition
        return points_ptr[points_size - 1];                                  
    } else { //k lies in the greater-than-pivot partition                    
        points_ptr += less_size;                                             
        points_size -= less_size + equal_size;                               
        k -= less_size + equal_size;                                         
    }                                                                        

    return qselect(points_ptr, points_size, k);                              
}

Results are indeed linear and better than qsort() (I have used the Fisher-Yates shuffle as @IVlad have suggested, so the absolute qsort() times are worse):

array size  qsort     qselect   speedup
1000        0.044678  0.008671  5.152328
5000        0.248413  0.045899  5.412160
10000       0.551095  0.096064  5.736730
20000       1.134857  0.191933  5.912773
30000       2.169177  0.278726  7.782467

kth smallest number - quicksort faster than quickselect

2 Answers2