2

From http://www.geeksforgeeks.org/merge-sort-for-linked-list/

The slow random-access performance of a linked list makes some other algorithms (such as quicksort) perform poorly, and others (such as heapsort) completely impossible.

However, I don't really see why quick sort would perform worse than merge sort while sorting a linked list.

In Quick Sort:

Choosing a pivot requires a random access, and needs to iterate through linked list (O(n) per recursion).

Partitioning can be done using left-to-right sweep manner (which doesn't require random access):

In Merge Sort:

Split at middle requires a random access, and needs to iterate through linked list (using fast-slow pointer mechanism) (O(n) per recursion).

Merging can be done left-to-right sweep manner (which doesn't require random access).

So as far as I can see, both Quick Sort and Merge Sort requires random access in each recursion, and I don't see why Quick Sort would perform worse than Merge Sort due to no-random access nature of Linked List.

Am I missing something here?

EDIT: I am looking at the partition function where pivot is the last element and we sweep from lwft sequentially. If partition works differently (i.e. pivot is in the middle and you maintain two pointers at each end), it would still work fine if linked list is doubly linked...

SHH
  • 3,226
  • 1
  • 28
  • 41
  • 1
    I saw the answers in that question. But all those answers assume that partition method works by moving pointers at each end and pibot is in the middle. By using different partition method (where pivot is always at the end, and you sequentially compare from left to righy), all those problems of random access no longer applies – SHH Jan 20 '17 at 20:04
  • 3
    You can do a merge sort in multiple (log n) passes, where each pass merges already sorted alternating sub-sequences from the previous pass. If each pass builds *two* linked lists, one for the odd sub-sequences and one for the even, you don't need to access anything except the head of each list. I feel that merge sort is *perfect* for linked lists. – Mark Ransom Jan 20 '17 at 21:41
  • What I don't understand is why anyone would sort any data structure that isn't backed by an array. Converting the list to an array, sorting it, then converting it back, will beat the pants of any in-place technique. – user207421 Jan 20 '17 at 23:21
  • @EJP are you so sure? If you had an object that was hard or expensive to copy, simply replacing the links from one object to the next would be a great alternative. – Mark Ransom Jan 21 '17 at 00:07
  • @user207421: we are not suggesting converting the list to an array of objects, but to allocate an array of pointers to the nodes, sort that with `qsort` and reconstruct the list from the sorted array contents. – chqrlie Jul 13 '19 at 09:58

4 Answers4

2

I'm updating this answer to provide a better comparison. In my original answer below, I include an example of bottom up merge sort, using a small array of pointers to lists. The merge function merges two lists into a destination list. As an alternative, the merge function could merge one list into the other via splice operations, which would mean only updating links about half the time for pseudo random data. For arrays, merge sort does more moves but fewer compares than quicksort, but if the linked list merge is merging one list into the other, the number of "moves" is cut in half.

For quicksort, the first node could be used as a pivot, and only nodes less than pivot would be moved, forming a list prior to the pivot (in reverse order), which would also mean only updating links about half of the time for pseudo random data.

The issue with quicksort is that the partitioning isn't perfect, even with psuedo random data, while merge sort (top down or bottom up) has the equivalent of perfect partitioning. A common analysis for quicksort considers the probability of a pivot falling in the middle 75% of a list through various means of choosing a pivot, for a 75% / 25% split (versus merge sort always getting 50% / 50% split). I compared a quicksort with first node as pivot versus merge sort with 4 million 64 bit pseudo random integers, and quicksort took 45% longer with 30% more splice operations (link updates or node "moves") and other overheads.


Original answer

For linked lists, there is an iterative bottom up version of merge sort that doesn't scan lists to split them, which avoids the issue of slow random access performance. A bottom up merge sort for linked list uses a small (25 to 32) array of pointers to nodes. Time complexity is O(n log(n)), and space complexity is O(1) (the array of 25 to32 pointers to nodes).

At that web page

http://www.geeksforgeeks.org/merge-sort-for-linked-list

I've posted a few comments, including a link to a working example of bottom up merge sort for linked list, but never received a response from that group. Link to working example used for that web site:

http://code.geeksforgeeks.org/Mcr1Bf

As for quick sort without random access, the first node could be used as the pivot. Three lists would be created, one list for nodes < pivot, one list for nodes == pivot, one list for nodes > pivot. Recursion would be used on the two lists for nodes != pivot. This has worst case time complexity of O(n^2), and worst case stack space complexity of O(n). The stack space complexity can be reduced to O(log(n)), by only using recursion on the shorter list with nodes != pivot, then looping back to sort the longer list using the first node of the longer list as the new pivot. Keeping track of the last node in a list, such as using a tail pointer to a circular list, would allow for quick concatenation of the other two lists. Worst case time complexity remains at O(n^2).

It should be pointed out that if you have the space, it's usually much faster to move the linked list to an array (or vector), sort the array, and create a new sorted list from the sorted array.

Example C code:

#include <stdio.h>
#include <stdlib.h>

typedef struct NODE_{
struct NODE_ * next;
int data;
}NODE;

/* merge two already sorted lists                    */
/* compare uses pSrc2 < pSrc1 to follow the STL rule */
/*   of only using < and not <=                      */
NODE * MergeLists(NODE *pSrc1, NODE *pSrc2)
{
NODE *pDst = NULL;          /* destination head ptr */
NODE **ppDst = &pDst;       /* ptr to head or prev->next */
    if(pSrc1 == NULL)
        return pSrc2;
    if(pSrc2 == NULL)
        return pSrc1;
    while(1){
        if(pSrc2->data < pSrc1->data){  /* if src2 < src1 */
            *ppDst = pSrc2;
            pSrc2 = *(ppDst = &(pSrc2->next));
            if(pSrc2 == NULL){
                *ppDst = pSrc1;
                break;
            }
        } else {                        /* src1 <= src2 */
            *ppDst = pSrc1;
            pSrc1 = *(ppDst = &(pSrc1->next));
            if(pSrc1 == NULL){
                *ppDst = pSrc2;
                break;
            }
        }
    }
    return pDst;
}

/* sort a list using array of pointers to list       */
/* aList[i] == NULL or ptr to list with 2^i nodes    */

#define NUMLISTS 32             /* number of lists */
NODE * SortList(NODE *pList)
{
NODE * aList[NUMLISTS];         /* array of lists */
NODE * pNode;
NODE * pNext;
int i;
    if(pList == NULL)           /* check for empty list */
        return NULL;
    for(i = 0; i < NUMLISTS; i++)   /* init array */
        aList[i] = NULL;
    pNode = pList;              /* merge nodes into array */
    while(pNode != NULL){
        pNext = pNode->next;
        pNode->next = NULL;
        for(i = 0; (i < NUMLISTS) && (aList[i] != NULL); i++){
            pNode = MergeLists(aList[i], pNode);
            aList[i] = NULL;
        }
        if(i == NUMLISTS)   /* don't go beyond end of array */
            i--;
        aList[i] = pNode;
        pNode = pNext;
    }
    pNode = NULL;           /* merge array into one list */
    for(i = 0; i < NUMLISTS; i++)
        pNode = MergeLists(aList[i], pNode);
    return pNode;
}

/* allocate memory for a list */
/* create list of nodes with pseudo-random data */
NODE * CreateList(int count)
{
NODE *pList;
NODE *pNode;
int i;
int r;
    /* allocate nodes */
    pList = (NODE *)malloc(count * sizeof(NODE));
    if(pList == NULL)
        return NULL;
    pNode = pList;                  /* init nodes */
    for(i = 0; i < count; i++){
        r  = (((int)((rand()>>4) & 0xff))<< 0);
        r += (((int)((rand()>>4) & 0xff))<< 8);
        r += (((int)((rand()>>4) & 0xff))<<16);
        r += (((int)((rand()>>4) & 0x7f))<<24);
        pNode->data = r;
        pNode->next = pNode+1;
        pNode++;
    }
    (--pNode)->next = NULL;
    return pList;
}

#define NUMNODES (1024)         /* number of nodes */
int main(void)
{
void *pMem;                     /* ptr to allocated memory */
NODE *pList;                    /* ptr to list */
NODE *pNode;
int data;

    /* allocate memory and create list */
    if(NULL == (pList = CreateList(NUMNODES)))
        return(0);
    pMem = pList;               /* save ptr to mem */
    pList = SortList(pList);    /* sort the list */
    data = pList->data;         /* check the sort */
    while(pList = pList->next){
        if(data > pList->data){
            printf("failed\n");
            break;
        }
        data = pList->data;
    }
    if(pList == NULL)
        printf("passed\n");
    free(pMem);                 /* free memory */
    return(0);
}
rcgldr
  • 27,407
  • 3
  • 36
  • 61
  • @chqrlie - if interested in a C++ version using iterators, take a look at "update #2" in this [old answer](https://stackoverflow.com/questions/40622430/40629882#40629882). – rcgldr Jul 13 '19 at 01:22
  • @chqrlie - I updated my answer, it's fixed now. Thanks for catching that. – rcgldr Jul 13 '19 at 08:14
1

You can split the list by a pivot element in linear time using constant extra memory (even though it's quite painful to implement for a singly-linked list) so it would have the same time complexity as the merge sort on average (the good think about the merge sort is that it's O(N log N) in the worst case). So they can be the same in terms of asymptotic behavior.

It can be hard to tell which one is faster (because the real run time is a property of an implementation, not the algorithm itself).

However, a partition that uses a random pivot is quite a mess for a singly linked list (it's possible, but the method I can think of has a larger constant than just getting two halves for the merge sort). Using the first or the last element as a pivot has an obvious issue: it works in O(N^2) for a sorted (or nearly sorted) lists. Taking this in account, I'd say that the merge sort would be a more reasonable choice in most of the cases.

kraskevich
  • 18,368
  • 4
  • 33
  • 45
1

As already pointed out, if single linked lists are used, merge sort and quick sort have the same average running time: O(n logn).

I'm not 100% sure which partition algorithm you have in mind, but the one sweeping algorithm I can come up would delete the current element from the list if it is larger than the pivot element and insert it at the end of the list. For making this change at least 3 operation are needed:

  1. the link of the parent element must be changed
  2. the link of the last element must be changed
  3. it must be updated, who is the last element

However this must be done only in 50% of the cases, so on average 1.5 changes per element during the partition-function.

On the other hand during the merge-function. In ca. 50% of the cases, two consecutive elements in the linked list are from the same original linked list -> there is nothing to do, because these elements are already linked. In the other case, we have to change a link - to the head of the other list. On average, 0.5 changes per element for the merge-function.

Clearly, one hast to know the exact costs of operations to know the final result, so this is only a hand waving explanation.

ead
  • 32,758
  • 6
  • 90
  • 153
  • 1
    I think you mean `O(n log n)`. – Mark Ransom Jan 20 '17 at 21:44
  • Merge sort has maximum time complexity of O(n log(n)), while quick sort maximum time complexity is O(n^2). Bottom up merge sort for linked lists only involves sequential access of linked lists, removing a node from the front of a list and appending a node to the end of a list, without any list splitting (I included example code in my answer). – rcgldr Jan 21 '17 at 18:30
0

Expanding on rcgldr's answer, I wrote a simplistic1 implementation of Quick Sort on linked lists using the first element as pivot (which behaves pathologically bad on sorted lists) and ran a benchmark on lists with pseudo-random data.

I implemented Quick Sort using recursion but taking care of avoiding a stack overflow on pathological cases by recursing only on the smaller half.

I also implemented the proposed alternative with an auxiliary array of pointers to the nodes.

Here is the code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

typedef struct NODE {
    struct NODE *next;
    int data;
} NODE;

/* merge two already sorted lists                    */
/* compare uses pSrc2 < pSrc1 to follow the STL rule */
/*   of only using < and not <=                      */
NODE *MergeLists(NODE *pSrc1, NODE *pSrc2) {
    NODE *pDst = NULL;          /* destination head ptr */
    NODE **ppDst = &pDst;       /* ptr to head or prev->next */
    for (;;) {
        if (pSrc2->data < pSrc1->data) {  /* if src2 < src1 */
            *ppDst = pSrc2;
            pSrc2 = *(ppDst = &(pSrc2->next));
            if (pSrc2 == NULL) {
                *ppDst = pSrc1;
                break;
            }
        } else {                        /* src1 <= src2 */
            *ppDst = pSrc1;
            pSrc1 = *(ppDst = &(pSrc1->next));
            if (pSrc1 == NULL) {
                *ppDst = pSrc2;
                break;
            }
        }
    }
    return pDst;
}

/* sort a list using array of pointers to list       */
NODE *MergeSort(NODE *pNode) {
#define NUMLISTS 32             /* number of lists */
    NODE *aList[NUMLISTS];      /* array of lists */
    /* aList[i] == NULL or ptr to list with 2^i nodes    */
    int i, n = 0;

    while (pNode != NULL) {
        NODE *pNext = pNode->next;
        pNode->next = NULL;
        for (i = 0; i < n && aList[i] != NULL; i++) {
            pNode = MergeLists(aList[i], pNode);
            aList[i] = NULL;
        }
        if (i == NUMLISTS)   /* don't go beyond end of array */
            i--;
        else
        if (i == n) /* extend array */
            n++;
        aList[i] = pNode;
        pNode = pNext;
    }
    for (i = 0; i < n; i++) {
        if (!pNode)
            pNode = aList[i];
        else if (aList[i])
            pNode = MergeLists(aList[i], pNode);
    }
    return pNode;
}

void QuickSortRec(NODE **pStart, NODE *pList, NODE *stop) {
    NODE *pivot, *left, *right;
    NODE **ppivot, **pleft, **pright;
    int data, nleft, nright;

    while (pList != stop && pList->next != stop) {
        data = pList->data;     // use the first node as pivot
        pivot = pList;
        ppivot = &pList->next;
        pleft = &left;
        pright = &right;
        nleft = nright = 0;

        while ((pList = pList->next) != stop) {
            if (data == pList->data) {
                *ppivot = pList;
                ppivot = &pList->next;
            } else
            if (data > pList->data) {
                nleft++;
                *pleft = pList;
                pleft = &pList->next;
            } else {
                nright++;
                *pright = pList;
                pright = &pList->next;
            }
        }
        *pleft = pivot;
        *pright = stop;
        *ppivot = right;
        if (nleft >= nright) {       // recurse on the smaller part
            if (nright > 1)
                QuickSortRec(ppivot, right, stop);
            pList = left;
            stop = pivot;
        } else {
            if (nleft > 1)
                QuickSortRec(pStart, left, pivot);
            pStart = ppivot;
            pList = right;
        }
    }
    *pStart = pList;
}

NODE *QuickSort(NODE *pList) {
    QuickSortRec(&pList, pList, NULL);
    return pList;
}

int NodeCmp(const void *a, const void *b) {
    NODE *aa = *(NODE * const *)a;
    NODE *bb = *(NODE * const *)b;
    return (aa->data > bb->data) - (aa->data < bb->data);
}

NODE *QuickSortA(NODE *pList) {
    NODE *pNode;
    NODE **pArray;
    int i, len;

    /* compute the length of the list */
    for (pNode = pList, len = 0; pNode; pNode = pNode->next)
        len++;
    if (len > 1) {
        /* allocate an array of NODE pointers */
        if ((pArray = malloc(len * sizeof(NODE *))) == NULL) {
            QuickSortRec(&pList, pList, NULL);
            return pList;
        }
        /* initialize the array from the list */
        for (pNode = pList, i = 0; pNode; pNode = pNode->next)
            pArray[i++] = pNode;
        qsort(pArray, len, sizeof(*pArray), NodeCmp);
        for (i = 0; i < len - 1; i++)
            pArray[i]->next = pArray[i + 1];
        pArray[i]->next = NULL;
        pList = pArray[0];
        free(pArray);
    }
    return pList;
}

int isSorted(NODE *pList) {
    if (pList) {
        int data = pList->data;
        while ((pList = pList->next) != NULL) {
            if (data > pList->data)
                return 0;
            data = pList->data;
        }
    }
    return 1;
}

void test(int count) {
    NODE *pMem1, *pMem2, *pMem3;
    NODE *pList1, *pList2, *pList3;
    int i;
    time_t t1, t2, t3;

    /* create linear lists of nodes with pseudo-random data */
    srand(clock());

    if (count == 0
    ||  (pMem1 = malloc(count * sizeof(NODE))) == NULL
    ||  (pMem2 = malloc(count * sizeof(NODE))) == NULL
    ||  (pMem3 = malloc(count * sizeof(NODE))) == NULL)
        return;

    for (i = 0; i < count; i++) {
        int data = rand();
        pMem1[i].data = data;
        pMem1[i].next = &pMem1[i + 1];
        pMem2[i].data = data;
        pMem2[i].next = &pMem2[i + 1];
        pMem3[i].data = data;
        pMem3[i].next = &pMem3[i + 1];
    }
    pMem1[count - 1].next = NULL;
    pMem2[count - 1].next = NULL;
    pMem3[count - 1].next = NULL;

    t1 = clock();
    pList1 = MergeSort(pMem1);
    t1 = clock() - t1;

    t2 = clock();
    pList2 = QuickSort(pMem2);
    t2 = clock() - t2;

    t3 = clock();
    pList3 = QuickSortA(pMem3);
    t3 = clock() - t3;

    printf("%10d", count);
    if (isSorted(pList1))
        printf(" %10.3fms", t1 * 1000.0 / CLOCKS_PER_SEC);
    else
        printf("     failed");
    if (isSorted(pList2))
        printf(" %10.3fms", t2 * 1000.0 / CLOCKS_PER_SEC);
    else
        printf("     failed");
    if (isSorted(pList3))
        printf(" %10.3fms", t3 * 1000.0 / CLOCKS_PER_SEC);
    else
        printf("     failed");
    printf("\n");

    free(pMem1);
    free(pMem2);
}

int main(int argc, char **argv) {
    int i;

    printf("        N      MergeSort    QuickSort   QuickSortA\n");
    if (argc > 1) {
        for (i = 1; i < argc; i++)
            test(strtol(argv[1], NULL, 0));
    } else {
        for (i = 10; i < 23; i++)
            test(1 << i);
    }
    return 0;
}

Here is the benchmark on lists with geometrically increasing lengths, showing N log(N) times:

        N      MergeSort    QuickSort   QuickSortA
      1024      0.052ms      0.057ms      0.105ms
      2048      0.110ms      0.114ms      0.190ms
      4096      0.283ms      0.313ms      0.468ms
      8192      0.639ms      0.834ms      1.022ms
     16384      1.233ms      1.491ms      1.930ms
     32768      2.702ms      3.786ms      4.392ms
     65536      8.267ms     10.442ms     13.993ms
    131072     23.461ms     34.229ms     27.278ms
    262144     51.593ms     71.619ms     51.663ms
    524288    114.656ms    240.946ms    120.556ms
   1048576    284.717ms    535.906ms    279.828ms
   2097152    707.635ms   1465.617ms    636.149ms
   4194304   1778.418ms   3508.703ms   1424.820ms

QuickSort() is approximately half as fast as MergeSort() on these datasets, but would behave much worse on partially ordered sets and other pathological cases, whereas MergeSort has a regular time complexity that does not depend on the dataset and performs a stable sort. QuickSortA() performs marginally better than MergeSort() for large datasets on my system, but performance will depend on the actual implementation of qsort, which does not necessarily use a Quick Sort algorithm.

MergeSort() does not allocate any extra memory and performs a stable sort, which makes it a clear winner to sort lists.


1) well, not so simplistic after all, but the choice of pivot is too simple

chqrlie
  • 131,814
  • 10
  • 121
  • 189