Mergesort implementation is slow

Question

I'am doing a report about different sorting algorithms in C++. What baffles me is that my mergesort seems to be slower than heapsort in both of the languages. What I've seen is that heapsort is supposed to be slower.

My mergesort sorts an unsorted array with size 100000 at a speed of 19.8 ms meanwhile heapsort sorts it at 9.7 ms. The code for my mergesort function in C++ is as follows:

void merge(int *array, int low, int mid, int high) {
    int i, j, k;
    int lowLength = mid - low + 1;
    int highLength = high - mid;

    int *lowArray = new int[lowLength];
    int *highArray = new int[highLength];

    for (i = 0; i < lowLength; i++)
        lowArray[i] = array[low + i];
    for (j = 0; j < highLength; j++)
        highArray[j] = array[mid + 1 + j];

    i = 0; 
    j = 0; 
    k = low; 
    while (i < lowLength && j < highLength) {
        if (lowArray[i] <= highArray[j]) {
            array[k] = lowArray[i];
            i++;
        } else {
            array[k] = highArray[j];
            j++;
        }
        k++;
    }

    while (i < lowLength) {
        array[k] = lowArray[i];
        i++;
        k++;
    }

    while (j < highLength) {
        array[k] = highArray[j];
        j++;
        k++;
    }
}

void mergeSort(int *array, int low, int high) {
    if (low < high) {
        int mid = low + (high - low) / 2;

        mergeSort(array, low, mid);
        mergeSort(array, mid + 1, high);

        merge(array, low, mid, high);
    }
}

From where did you got to know heap-sort is slower than merge-sort ? The extra copy operation of time linear in the no. of elements of the sub array, is not there in the heap-sort. That uses quite precise space and only optimum stuffs, how com that should be more costly ? — Prem KTiw, Jan 15 '17 at 20:49
From the big o notation it seems like heapsort is supposed to be slower. — frqency, Jan 15 '17 at 20:59
Big O notation says the complexity of both of them to be nlogn and so how do you say heap-sort to be better ? — Prem KTiw, Jan 15 '17 at 21:01
The only reason if at all merge sort would perform better, then that would be due to locality of reference. Also that can only become visible possibly when you run both of them on single dedicated processor, and on many randomized inputs, and then take average. For just one input claiming one algorithm bad than other, when the O notations say them to be equivalent will not be very good. Main reason here behind increased running time is possibly the heap allocation of array, as stack allocation is faster than heap allocation (here heap means memory space , and not sorting algorithm's heap). — Prem KTiw, Jan 15 '17 at 21:08
In most "industrial" implementations of mergesort I've seen they allocate (at most) two lists in the beginning and then swap data from one list to the other and back in the recursie process, they do not allocate a list for every call. — Willem Van Onsem, Jan 15 '17 at 21:36
@WillemVanOnsem - most "industrial" implementations of merge sort use some variation of bottom up merge sort (which doesn't use recursion). As you commented, there's a one time allocation of a temp buffer, and the direction of the merge changes depending on the pass count (for someone making a top down merge sort, the merge direction depends on the level of recursion). — rcgldr, Jan 16 '17 at 02:22

score 2 · Accepted Answer · answered Jan 16 '17 at 01:40

The example merge sort is doing allocation and copying of data in merge(), and both can be eliminated with a more efficient merge sort. A single allocation for the temp array can be done in a helper / entry function, and the copy is avoided by changing the direction of merge depending on level of recursion either by using two mutually recursive functions (as in example below) or with a boolean parameter.

Here is an example of a C++ top down merge sort that is reasonably optimized. A bottom up merge sort would be slightly faster, and on a system with 16 registers, a 4 way bottom merge sort a bit faster still, about as fast or faster than quick sort.

// prototypes
void TopDownSplitMergeAtoA(int a[], int b[], size_t ll, size_t ee);
void TopDownSplitMergeAtoB(int a[], int b[], size_t ll, size_t ee);
void TopDownMerge(int a[], int b[], size_t ll, size_t rr, size_t ee);

void MergeSort(int a[], size_t n)       // entry function
{
    if(n < 2)                           // if size < 2 return
        return;
    int *b = new int[n];
    TopDownSplitMergeAtoA(a, b, 0, n);
    delete[] b;
}

void TopDownSplitMergeAtoA(int a[], int b[], size_t ll, size_t ee)
{
    if((ee - ll) == 1)                  // if size == 1 return
        return;
    size_t rr = (ll + ee)>>1;           // midpoint, start of right half
    TopDownSplitMergeAtoB(a, b, ll, rr);
    TopDownSplitMergeAtoB(a, b, rr, ee);
    TopDownMerge(b, a, ll, rr, ee);     // merge b to a
}

void TopDownSplitMergeAtoB(int a[], int b[], size_t ll, size_t ee)
{
    if((ee - ll) == 1){                 // if size == 1 copy a to b
        b[ll] = a[ll];
        return;
    }
    size_t rr = (ll + ee)>>1;           // midpoint, start of right half
    TopDownSplitMergeAtoA(a, b, ll, rr);
    TopDownSplitMergeAtoA(a, b, rr, ee);
    TopDownMerge(a, b, ll, rr, ee);     // merge a to b
}

void TopDownMerge(int a[], int b[], size_t ll, size_t rr, size_t ee)
{
    size_t o = ll;                      // b[]       index
    size_t l = ll;                      // a[] left  index
    size_t r = rr;                      // a[] right index
    while(1){                           // merge data
        if(a[l] <= a[r]){               // if a[l] <= a[r]
            b[o++] = a[l++];            //   copy a[l]
            if(l < rr)                  //   if not end of left run
                continue;               //     continue (back to while)
            while(r < ee)               //   else copy rest of right run
                b[o++] = a[r++];
            break;                      //     and return
        } else {                        // else a[l] > a[r]
            b[o++] = a[r++];            //   copy a[r]
            if(r < ee)                  //   if not end of right run
                continue;               //     continue (back to while)
            while(l < rr)               //   else copy rest of left run
                b[o++] = a[l++];
            break;                      //     and return
        }
    }
}

Thank you! Yeah I figured out that allocating the array was highly ineffecient. — frqency, Jan 16 '17 at 17:01
Good explanation and proposed alternative. Just one small bug: `size_t rr = (ll + ee)>>1;` is incorrect if `ll + ee` overflows, which is possible for large arrays on some architectures such as 16-bit segmented x86. Use `size_t rr = ll + ((ee - ll) >> 1);` instead. You could also avoid allocation with an automatic array for sizes below a given threshold. — chqrlie, Apr 20 '19 at 09:48
@chqrlie - This was optimized code for a specific 32 bit environment, and overflow can't happen with 32 bit integers and 32 bit indexing. In current code, I use `ll +((ee-ll)/2)` (trusting compiler to optimize this to a right shift). — rcgldr, Apr 20 '19 at 09:53
@rcgldr: I'm afraid overflow happens the same in most environments: `size_t` unsigned integer arithmetics makes the behavior defined and the operation is performed modulo `SIZE_MAX+1`, but the result is still incorrect: Say you are sorting an array of `SIZE_MAX/2 + 2` elements, computing the midpoint of the 2 element slice starting at `SIZE_MAX/2`, `rr` will evaluate to `0` instead of `SIZE_MAX/2 + 1`, with catastrophic results. What saves you on the 32-bit flat memory model is the system limit on array sizes or the failure to allocate a temporary array with this huge size. — chqrlie, Apr 20 '19 at 10:04
@rcgldr: and you are correct that arrays of `int` cannot have a size > `SIZE_MAX / (sizeof(int))`, hence no overflow in `ee+ll` if `sizeof(int) > 1`. — chqrlie, Apr 20 '19 at 10:09
@chqrlie - for a 32 bit environment, overflow would require an array with >= 2^31 == 2GB elements. This could be possible with an array of characters and a 32 bit environment with > 2GB user space, but using quick sort on an array of characters doesn't make much sense when counting sort would be a much faster alternative. — rcgldr, Apr 22 '19 at 07:09

Mergesort implementation is slow

1 Answers1

Linked