1

I'm searching about merge sort and I found two kinds of functions.

First one is using recursion like this.

#include <stdio.h>

void merge(array, low, mid, high) {
    int temp[MAX];
    int i = low;
    int j = mid + 1;
    int k = low;

    while ((i <= mid) && (j <= high)) {
        if (array[i] <= array[j])
            temp[k++] = array[i++];
        else
            temp[k++] = array[j++];
    }/*End of while*/

    while (i <= mid)
        temp[k++] = array[i++];

    while (j <= high)
        temp[k++] = array[j++];

    for (i = low; i <= high; i++)
        array[i] = temp[i];

}/*End of merge()*/

void merge_sort(int low, int high) {
    int mid;
    if (low != high) {
        mid = (low + high) / 2;
        merge_sort(low, mid);
        merge_sort(mid + 1, high);
        merge(low, mid, high);
    }
}/*End of merge_sort*/

And then, I thought recursive function is not good for large arrays. This function causes a lot of recursive calls in this case. I think this is bad way of programming. (Actually I don't like recursion.)

So, I found other way, a merge sorting function without recursion:

#include <stdio.h>

#define MAX 30

int main() {
    int arr[MAX], temp[MAX], i, j, k, n, size, l1, h1, l2, h2;

    printf("Enter the number of elements : ");
    scanf("%d", &n);

    for (i = 0; i < n; i++) {
        printf("Enter element %d : ", i + 1);
        scanf("%d", &arr[i]);
    }

    printf("Unsorted list is : ");
    for (i = 0; i < n; i++)
        printf("%d ", arr[i]);

    /* l1 lower bound of first pair and so on */
    for (size = 1; size < n; size = size * 2) {
        l1 = 0;
        k = 0;  /* Index for temp array */
        while (l1 + size < n) {
            h1 = l1 + size - 1;
            l2 = h1 + 1;
            h2 = l2 + size - 1;
            /* h2 exceeds the limlt of arr */
            if (h2 >= n) 
                h2 = n - 1;

            /* Merge the two pairs with lower limits l1 and l2 */
            i = l1;
            j = l2;

            while (i <= h1 && j <= h2) {
                if (arr[i] <= arr[j])
                    temp[k++] = arr[i++];
                else
                    temp[k++] = arr[j++];
            }

            while (i <= h1)
                temp[k++] = arr[i++];
            while (j <= h2)
                temp[k++] = arr[j++];

            /** Merging completed **/
            /*Take the next two pairs for merging */
            l1 = h2 + 1; 
        }/*End of while*/

        /*any pair left */
        for (i = l1; k < n; i++) 
            temp[k++] = arr[i];

        for (i = 0; i < n; i++)
            arr[i] = temp[i];

        printf("\nSize=%d \nElements are : ", size);
        for (i = 0; i < n; i++)
            printf("%d ", arr[i]);

    }/*End of for loop */

    printf("Sorted list is :\n");
    for (i = 0; i < n; i++)
        printf("%d ", arr[i]);

    printf("\n");

    return 0;
}/*End of main()*/

I think this is better than using recursion. This function reduced recursion to a series of for and while loops! Of course, they have behave differently. I think a recursive function is not good for compiler. Am I right?

JINU_K
  • 133
  • 7
  • 1
    In general I share your opinion that it's preferable to use loops rather than recursion. The latter may cause problems if you're operating with large lists and overflow the stack. Anyways, your question is opinion based and thus _off-topic_ here. – πάντα ῥεῖ Apr 19 '19 at 07:56
  • Oh, I didn't know about this topic is off-topic... Anyway, thanks for your comment. – JINU_K Apr 19 '19 at 08:03
  • 1
    For a collection of N elements to sort, you get a recursion depth of ~log_2(N). This means, even if you had to sort all word of [all articles of Wikipedia](https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia), you'd end up with a recursion depth of a few dozens... – YSC Apr 19 '19 at 08:39
  • 1
    @πάνταῥεῖ - Since bottom up merge sort can be demonstrated to be faster than top down merge sort, I would not consider this to be opinion based. – rcgldr Apr 20 '19 at 01:36

3 Answers3

2

Assuming optimized implementations, iterative bottom up merge sort is somewhat faster than recursive top down merge sort, since it skips the recursive generation of indexes. For larger arrays, top down's additional overhead is relatively small, O(log(n)), compared to overall time of O(n log(n)), where in both cases, most of the time is spent doing a merge which can be identical for both bottom up and top down. Top down merge sort uses O(log(n)) stack space, while both use O(n) working space. However, almost all library implementations of stable sort are some variation of iterative bottom up merge sort, such as a hybrid of insertion sort and bottom up merge sort.

Link to an answer showing an optimized top down merge sort, using a pair of mutually recursive functions to control the direction of merge to avoid copying of data:

Mergesort implementation is slow

Link to an answer than includes a quick sort, 2 way bottom up merge sort, and 4 way bottom up merge sort:

Optimized merge sort faster than quicksort

rcgldr
  • 27,407
  • 3
  • 36
  • 61
  • Thanks for your answer. I read your link, and it is very helpful posts for me. Can I ask _off-topic_ about your link? I read [Mergesort implementation is slow](https://stackoverflow.com/questions/41665863/mergesort-implementation-is-slow/41668335#41668335) link. And I confused about the post's comment. – JINU_K Apr 22 '19 at 04:13
  • Your `size_t rr = (ll + ee)>>1` code. Someone comment it can be cause overflow. Finally he withdraw about that.. But I don't understand about `int` cannot have a size > `SIZE_MAX/ (sizeof(int))` and also he said no overflow in `ee+ll` if `sizeof(int) > 1`. what these things mean? – JINU_K Apr 22 '19 at 04:18
  • I searched about that, I thought array size is proportional as memory. Then, `int` can use over 32bit or more. Am I understood well??? – JINU_K Apr 22 '19 at 04:22
  • 1
    @JINU_K - In 32 bit mode, size_t is normally a 32 bit unsigned integer. For (ll+ee) to overflow, the sum of ll+ee has to be greater than 2^32 == 4GB, which would mean the array would need >= 2^31 = 2GB elements before overlow could happen. Since that particular example was sorting 32 bit integers, then 2GB of elements would require 8GB of space, which wouldn't be possible in a 32 bit environment. It could be possible if sorting a 2GB array of characters, and I use the alternative of `size_t rr = ll + (ee-ll)/2;` , assuming the compiler will optimize divide by 2 to a right shift. – rcgldr Apr 22 '19 at 05:06
  • 1
    @JINU_K - in 64 bit mode, size_t is normally a 64 bit unsigned integer, so overflow can't happen, as it would require an array with >= 2^63 elements. – rcgldr Apr 22 '19 at 05:08
  • Thanks for your kind reply. As you say, `size_t rr = (ll + ee)>>2` can't be overflow, because unsigned int have a huge **MAX_SIZE**_(32bit require 2^31 Array and 64bit need more)_. So most of function cannot use all of them.. right? – JINU_K Apr 22 '19 at 06:11
  • 1
    @JINU_K - for a 32 bit environment, size_t is usually an unsigned 32 bit integer. To get overflow, an array needs >= 2^31 == 2GB elements. This could be possible for an array of `char`, assuming an 32 environment with greater than 2GB user space. However, using quick sort on a character array doesn't make much sense when counting sort would be a much faster algorithm. – rcgldr Apr 22 '19 at 07:07
1

You are somewhat right. Iterative bottom up merge sort is faster than recursive top down merge sort. Both methods are good for compiler ;) but recursive method takes more time to compile.

Ravindu Perera
  • 307
  • 2
  • 5
  • 1
    *recursive method takes more time to compile*... Not really. both functions compile equally fast, the recursive approach is actually smaller (half as much code), so it might eve compile faster, but the difference would be extremely small. Furthermore compilation time is irrelevant: execution time is the measure of performance. – chqrlie Apr 20 '19 at 09:18
  • @chqrlie As you say, is it more important to construct good logic of code? And I don't like recrusive function, because if one recrusive function call two or more recrution, it cause expotential increase, I thought. And most of recrusive logic function called more than one. I'm very afraid about that.. Am I worried to much? – JINU_K Apr 22 '19 at 04:34
  • 2
    @chqrlie and you misspelt your at-mention – Antti Haapala -- Слава Україні Apr 22 '19 at 09:26
  • 1
    @JINU_K: please notice that you misspell recursive and recursion. Programming in C requires a good understanding of the algorithms. Indeed recursive functions can cause problems if the number of nested calls is too high, but in the merge sort case, the recursion level is bounded by log2(N), so only a few dozens for huge arrays. Iterative programs may behave incorrectly too, with infinite loops and all programs may have undefined behavior and many other bugs. A simple recursive approach is sometimes easier to understand than a more complicated set of nested loops. – chqrlie Apr 22 '19 at 13:39
  • oh I misspelled about **recrusive->recursive** thank you!! And I realized that I have to learn about time&space complexity. Your replys and answer is good lecture for me:) – JINU_K Apr 23 '19 at 04:27
1

Your code for the recursive approach to merge sort has problems:

  • the prototype for merge does not have the argument types.
  • the array is missing from the arguments list of merge_sort
  • passing high as the index of the last element is error prone and does not allow for empty arrays. You should instead pass the index to the first element beyond the end of the array, such that high - low is the number of elements in the slice to sort. This way the first call to merge_sort can take 0 and the array size.
  • it is both wasteful and incorrect to allocate a full array int temp[MAX]; for each recursive call. Wasteful because the size might be much larger than needed, leading to potential stack overflow, and incorrect if high - low + 1 is larger than MAX leading to writing beyond the end of the array, causing undefined behavior.

This merge_sort function will call itself recursively at most log2(high - low) times, each call allocating a temporary local array. The number of recursive calls is not the problem, only 30 for 1 billion records, but the local arrays are! If you try to sort a large array, there might not be enough space on the stack for a copy of this array, much less multiple copies, leading to undefined behavior, most likely a crash.

Note however that the iterative approach that you found has the same problem as it allocates temp[MAX] with automatic storage as well.

The solution is to allocate a temporary array from the heap at the top and pass it to the recursive function.

Here is an improved version:

#include <stdio.h>

static void merge(int *array, int *temp, int low, int mid, int high) {
    int i = low;
    int j = mid;
    int k = 0;

    while (i < mid && j < high) {
        if (array[i] <= array[j])
            temp[k++] = array[i++];
        else
            temp[k++] = array[j++];
    }

    while (i < mid)
        temp[k++] = array[i++];

    while (j < high)
        temp[k++] = array[j++];

    for (i = low, k = 0; i < high; i++, k++)
        array[i] = temp[k];
}

static void merge_sort_aux(int *array, int *temp, int low, int high) {
    if (high - low > 1) {
        int mid = (low + high) / 2;
        merge_sort_aux(array, temp, low, mid);
        merge_sort_aux(array, temp, mid, high);
        merge(array, temp, low, mid, high);
    }
}

int merge_sort(int *array, int size) {
    if (size > 1) {
        int *temp = malloc(size * sizeof(*temp));
        if (temp == NULL)
            return -1;

        merge_sort_aux(array, temp, 0, size);
        free(temp);
    }
    return 0;
}

// call from main as merge_sort(arr, MAX)

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • I got that what is the problem at my code. thank you. Memory managing is very important at recrusive function. So I have to define temp Array to Dynamic and free for memory. – JINU_K Apr 22 '19 at 10:39