Optimized merge sort faster than quicksort

Question

http://jsperf.com/optimized-mergesort-versus-quicksort

Why does this half buffer merge sort work as fast as quicksort?

QuickSort is:

In-Place although it takes up log(n) recursions (stack space)
Cache-Friendly

This half buffer merge sort:

Uses an n/2 Buffer to do merges.
Uses log(n) recursions.
Makes fewer comparisons.

My question is, why is the half buffer merge sort matching the speed of QuickSort in this scenario? Plus, is there anything I'm doing wrong to the quickSort that makes it slower?

function partition(a, i, j) {
    var p = i + Math.floor((j - i) / 2);
    var left = i + 1;
    var right = j;
    swap(a, i, p);
    var pivot = a[i];
    while (left <= right) {
        while (builtinLessThan(a[left], pivot)) {
            ++left;
        }
        while (builtinLessThan(pivot, a[right])) {
            --right;
        }
        if (left <= right) {
            swap(a, left, right);
            ++left;
            --right;
        }
    }
    swap(a, i, right);
    return right;
};

function quickSort(a, i, j) {
    var p = partition(a, i, j);
    if ((p) - i > j - p) {
        if (i < p - 1) {
            quickSort(a, i, p - 1);
        }
        if (p + 1 < j) {
            quickSort(a, p + 1, j);
        }
    } else {
        if (p + 1 < j) {
            quickSort(a, p + 1, j);
        } if (i < p - 1) {
            quickSort(a, i, p - 1);
        }
    }
};

Well, I'm testing this in chrome and in Node. In node, it's 2X faster than quicksort. In FireFox, it's 5% faster as well as chrome. — ahitt6345, Jan 17 '16 at 22:42
Comprasions are cheap. Optimized quicksort do ~2.5 times less swaps then merge sort. So, if swap operation is cheap, merge sort will be faster. Otherwise quicksort will win. — SashaMN, Jan 17 '16 at 22:56
I tested expensive comparisons already(merge sort was soooo much faster). But I never thought of testing expensive (swaps/array accesses) any idea on how to test this? — ahitt6345, Jan 17 '16 at 23:24
Link to jsperf is dead so I can't tell what's being compared here any more. Can we put the relevant code in this question? — ggorlen, Sep 18 '20 at 17:39

rcgldr · Accepted Answer · 2016-01-18T03:42:02.680

Merge sort does fewer compares, but more moves than quick sort. Having to call a function to do the compares increases the overhead for compares, which makes quick sort slower. All those if statements in the example quick sort is also slowing it down. If the compare and swap are done inline, then quick sort should be a bit faster if sorting an array of pseudo random integers.

If running on a processor with 16 registers, such a PC in 64 bit mode, then 4 way merge sort using a bunch of pointers that end up in registers is about as fast as quick sort. A 2 way merge sort averages 1 compare for each element moved, while a 4 way merge sort averages 3 compares for each element moved, but only takes 1/2 the number of passes, so the number of basic operations is the same, but the compares are a bit more cache friendly, making the 4 way merge sort about 15% faster, about the same as quick sort.

I'm not familiar with java script, so I'm converting the examples to C++.

Using a converted version of the java script merge sort, it takes about 2.4 seconds to sort 16 million pseudo random 32 bit integers. The example quick sort shown below takes about 1.4 seconds, and the example bottom up merge shown below sort about 1.6 seconds. As mentioned, a 4 way merge using a bunch of pointers (or indices) on a processor with 16 registers would also take about 1.4 seconds.

C++ quick sort example:

void QuickSort(int a[], int lo, int hi) {
    int i = lo, j = hi;
    int pivot = a[(lo + hi) / 2];
    int t;
    while (i <= j) {            // partition
        while (a[i] < pivot)
            i++;
        while (a[j] > pivot)
            j--;
        if (i <= j) {
            t = a[i]
            a[i] = a[j];
            a[j] = t;
            i++;
            j--;
        }
    }
    if (lo < j)                 // recurse
        QuickSort(a, lo, j);
    if (i < hi)
        QuickSort(a, i, hi);
}

C++ bottom up merge sort example:

void BottomUpMergeSort(int a[], int b[], size_t n)
{
size_t s = 1;                               // run size 
    if(GetPassCount(n) & 1){                // if odd number of passes
        for(s = 1; s < n; s += 2)           // swap in place for 1st pass
            if(a[s] < a[s-1])
                std::swap(a[s], a[s-1]);
        s = 2;
    }
    while(s < n){                           // while not done
        size_t ee = 0;                      // reset end index
        while(ee < n){                      // merge pairs of runs
            size_t ll = ee;                 // ll = start of left  run
            size_t rr = ll+s;               // rr = start of right run
            if(rr >= n){                    // if only left run
                rr = n;
                BottomUpCopy(a, b, ll, rr); //   copy left run
                break;                      //   end of pass
            }
            ee = rr+s;                      // ee = end of right run
            if(ee > n)
                ee = n;
            BottomUpMerge(a, b, ll, rr, ee);
        }
        std::swap(a, b);                    // swap a and b
        s <<= 1;                            // double the run size
    }
}

void BottomUpMerge(int a[], int b[], size_t ll, size_t rr, size_t ee)
{
    size_t o = ll;                          // b[]       index
    size_t l = ll;                          // a[] left  index
    size_t r = rr;                          // a[] right index
    while(1){                               // merge data
        if(a[l] <= a[r]){                   // if a[l] <= a[r]
            b[o++] = a[l++];                //   copy a[l]
            if(l < rr)                      //   if not end of left run
                continue;                   //     continue (back to while)
            while(r < ee)                   //   else copy rest of right run
                b[o++] = a[r++];
            break;                          //     and return
        } else {                            // else a[l] > a[r]
            b[o++] = a[r++];                //   copy a[r]
            if(r < ee)                      //   if not end of right run
                continue;                   //     continue (back to while)
            while(l < rr)                   //   else copy rest of left run
                b[o++] = a[l++];
            break;                          //     and return
        }
    }
}

void BottomUpCopy(int a[], int b[], size_t ll, size_t rr)
{
    while(ll < rr){                         // copy left run
        b[ll] = a[ll];
        ll++;
    }
}

size_t GetPassCount(size_t n)               // return # passes
{
    size_t i = 0;
    for(size_t s = 1; s < n; s <<= 1)
        i += 1;
    return(i);
}

C++ example of 4 way merge sort using pointers (goto's used to save code space, it's old code). It starts off doing 4 way merge, then when the end of a run is reached, it switches to 3 way merge, then 2 way merge, then a copy of what's left of the remaining run. This is similar to algorithms used for external sorts, but external sort logic is more generalized and often handles up to 16 way merges.

int * BottomUpMergeSort(int a[], int b[], size_t n)
{
int *p0r;       // ptr to      run 0
int *p0e;       // ptr to end  run 0
int *p1r;       // ptr to      run 1
int *p1e;       // ptr to end  run 1
int *p2r;       // ptr to      run 2
int *p2e;       // ptr to end  run 2
int *p3r;       // ptr to      run 3
int *p3e;       // ptr to end  run 3
int *pax;       // ptr to set of runs in a
int *pbx;       // ptr for merged output to b
size_t rsz = 1; // run size
    if(n < 2)
        return a;
    if(n == 2){
        if(a[0] > a[1])std::swap(a[0],a[1]);
        return a;
    }
    if(n == 3){
        if(a[0] > a[2])std::swap(a[0],a[2]);
        if(a[0] > a[1])std::swap(a[0],a[1]);
        if(a[1] > a[2])std::swap(a[1],a[2]);
        return a;
    }
    while(rsz < n){
        pbx = &b[0];
        pax = &a[0];
        while(pax < &a[n]){
            p0e = rsz + (p0r = pax);
            if(p0e >= &a[n]){
                p0e = &a[n];
                goto cpy10;}
            p1e = rsz + (p1r = p0e);
            if(p1e >= &a[n]){
                p1e = &a[n];
                goto mrg201;}
            p2e = rsz + (p2r = p1e);
            if(p2e >= &a[n]){
                p2e = &a[n];
                goto mrg3012;}
            p3e = rsz + (p3r = p2e);
            if(p3e >= &a[n])
                p3e = &a[n];
            // 4 way merge
            while(1){
                if(*p0r <= *p1r){
                    if(*p2r <= *p3r){
                        if(*p0r <= *p2r){
mrg40:                      *pbx++ = *p0r++;    // run 0 smallest
                            if(p0r < p0e)       // if not end run continue
                                continue;
                            goto mrg3123;       // merge 1,2,3
                        } else {
mrg42:                      *pbx++ = *p2r++;    // run 2 smallest
                            if(p2r < p2e)       // if not end run continue
                                continue;
                            goto mrg3013;       // merge 0,1,3
                        }
                    } else {
                        if(*p0r <= *p3r){
                            goto mrg40;         // run 0 smallext
                        } else {
mrg43:                      *pbx++ = *p3r++;    // run 3 smallest
                            if(p3r < p3e)       // if not end run continue
                                continue;
                            goto mrg3012;       // merge 0,1,2
                        }
                    }
                } else {
                    if(*p2r <= *p3r){
                        if(*p1r <= *p2r){
mrg41:                      *pbx++ = *p1r++;    // run 1 smallest
                            if(p1r < p1e)       // if not end run continue
                                continue;
                            goto mrg3023;       // merge 0,2,3
                        } else {
                            goto mrg42;         // run 2 smallest
                        }
                    } else {
                        if(*p1r <= *p3r){
                            goto mrg41;         // run 1 smallest
                        } else {
                            goto mrg43;         // run 3 smallest
                        }
                    }
                }
            }
            // 3 way merge
mrg3123:    p0r = p1r;
            p0e = p1e;
mrg3023:    p1r = p2r;
            p1e = p2e;
mrg3013:    p2r = p3r;
            p2e = p3e;
mrg3012:    while(1){
                if(*p0r <= *p1r){
                    if(*p0r <= *p2r){
                        *pbx++ = *p0r++;        // run 0 smallest
                        if(p0r < p0e)           // if not end run continue
                            continue;
                        goto mrg212;            // merge 1,2
                    } else {
mrg32:                  *pbx++ = *p2r++;        // run 2 smallest
                        if(p2r < p2e)           // if not end run continue
                            continue;
                        goto mrg201;            // merge 0,1
                    }
                } else {
                    if(*p1r <= *p2r){
                        *pbx++ = *p1r++;        // run 1 smallest
                        if(p1r < p1e)           // if not end run continue
                            continue;
                        goto mrg202;            // merge 0,2
                    } else {
                        goto mrg32;             // run 2 smallest
                    }
                }
            }
            // 2 way merge
mrg212:     p0r = p1r;
            p0e = p1e;
mrg202:     p1r = p2r;
            p1e = p2e;
mrg201:     while(1){
                if(*p0r <= *p1r){
                    *pbx++ = *p0r++;            // run 0 smallest
                    if(p0r < p0e)               // if not end run continue
                        continue;
                    goto cpy11;
                } else {
                    *pbx++ = *p1r++;            // run 1 smallest
                    if(p1r < p1e)               // if not end run continue
                        continue;
                    goto cpy10;
                }
            }
            // 1 way copy
cpy11:      p0r = p1r;
            p0e = p1e;
cpy10:      while (1) {
                *pbx++ = *p0r++;                // copy element
                if (p0r < p0e)                  // if not end of run continue
                    continue;
                break;
            }
            pax += rsz << 2;            // setup for next set of runs
        }
        std::swap(a, b);                // swap ptrs
        rsz <<= 2;                      // quadruple run size
    }
    return a;                           // return sorted array
}

I disagree with the top statement. I left the function called `compare` function in merge sort and did the inline built in `>` compare operator in the quicksort only. Although It did gain a slight boost in speed(0.01 second boost on 200,000 elements), it was still 2X slower than the merge sort with the function comparison overhead. — ahitt6345, Jan 18 '16 at 05:08
Plus, those extra if statements are optimizations. One of the quicksort optimizations is to recurse to the smaller side of the partition FIRST. Thats what all of those if statements do. — ahitt6345, Jan 18 '16 at 05:10
Plus, the recursive merge sort implementation is being used. The only thing that was optimized was the `merge` function. — ahitt6345, Jan 18 '16 at 05:12
Plus, I don't really get C++ syntax(especially pointers and how they point to runs in your code Nor do I know how arrays in C++ work Ill keep looking at it and see if I can make sense of it 4-way merge sort looks cool). It's really unclear to me... Plus ,the way I structured the quickSort and mergeSort are similar to each other(each partition/merge is called in a separate function.) This was done on purpose. — ahitt6345, Jan 18 '16 at 05:15
Okay, In the quickSort implementation(in javascript) I removed almost ALL the function call overheads(including partitions and swaps) and moved it all into one function. Still take 0.01 more seconds than the optimized merge sort on 200000 elements. Left all function call overheads in merge sort cuz I was too lazy to take em out. — ahitt6345, Jan 18 '16 at 05:28
My eyes are starting to go crazy. I test both algorithms on 100 200000 element arrays and then suddenly quicksort has a 0.5 second lead on the merge sort. — ahitt6345, Jan 18 '16 at 05:35
Okay, I guess you were right about the function call overheads. I was just testing the wrong ones. — ahitt6345, Jan 18 '16 at 05:37
@ahitt6345 - the 4 way merge sort isn't used very often as a memory sort, and for an external sort, all those if statements to find the smallest of 4 values, then 3 values, then 2 values, are changed to a loop and/or minimum heap to find the smallest of 8 to 16 values, since compare overhead is considered to be zero, versus the time it takes to read / write records for an external sort. — rcgldr, Jan 18 '16 at 21:36
What do you think about *D.Abhyankar - A Fast Merge Sort* (See https://pdfs.semanticscholar.org/aa6c/00dd9929a494e3b0c70397915d26ef2319c4.pdf)? — Royi, Jul 13 '20 at 08:55
@Royi - The timings seem very slow. I have a 8 year old computer (Intel 3770K, 3.5ghz), and a standard merge sort for 100,000 64 bit unsigned integers takes 6.45 ms, less than the time reported for sorting 20,000 elements in that article. I'm not sure that 2 conditionals versus 1 in a loop that involves two memory compares, even if cached, is going to make much difference. Using 5 for the switch to insertion sort seems low. Visual Studio templates switch at 32 elements. Other versions switch at 32 or 64, depending on which results in an even number of passes for bottom up merge sort. — rcgldr, Jul 13 '20 at 15:03
@Royi - most libraries use some variation of a hybrid bottom up merge sort and insertion sort. In my versions, I determine the number of passes for a pure merge sort, if the numbers of passes would be odd, I use insertion sort on groups of 32 element so that the number of merge passes is even. If the number of passes is already even, I use insertion sort on groups of 64 elements so that the number of passes remains even. The merging alternates direction on each pass, eliminating the need for copy back, and with an even number of passes, the sorted data ends up in the original array. — rcgldr, Jul 13 '20 at 18:27

Optimized merge sort faster than quicksort

1 Answers1

Linked