3

I was thinking about counting sort and how we implement it, actually how the algorithm works. I am stuck with one part, algorithm is really straightforward and easy to understand but one part of it doesn't seem necessary. I thought people might mistaken or so, but it seems like everyone using the same method so I am mistaken somewhere. Can you please explain.

Here is code for counting sort from geeksforgeeks

    // C Program for counting sort
#include <stdio.h>
#include <string.h>
#define RANGE 255

// The main function that sort the given string arr[] in
// alphabatical order
void countSort(char arr[])
{
    // The output character array that will have sorted arr
    char output[strlen(arr)];

    // Create a count array to store count of inidividul
    // characters and initialize count array as 0
    int count[RANGE + 1], i;
    memset(count, 0, sizeof(count));

    // Store count of each character
    for(i = 0; arr[i]; ++i)
        ++count[arr[i]];

    // Change count[i] so that count[i] now contains actual
    // position of this character in output array
    for (i = 1; i <= RANGE; ++i)
        count[i] += count[i-1];

    // Build the output character array
    for (i = 0; arr[i]; ++i)
    {
        output[count[arr[i]]-1] = arr[i];
        --count[arr[i]];
    }

    // Copy the output array to arr, so that arr now
    // contains sorted characters
    for (i = 0; arr[i]; ++i)
        arr[i] = output[i];
}

// Driver program to test above function
int main()
{
    char arr[] = "geeksforgeeks";//"applepp";

    countSort(arr);

    printf("Sorted character array is %s\n", arr);
    return 0;
}

Cool , but about this part:

// Build the output character array
        for (i = 0; arr[i]; ++i)
        {
            output[count[arr[i]]-1] = arr[i];
            --count[arr[i]];
        }

Why do I need this ?? Ok I counted my numbers :

Let's say I had array -> [1, 3, 6, 3, 2, 4]

         INDEXES     0  1  2  3  4  5  6
  I created this -> [0, 1, 1, 2, 1, 0, 1]

Than this part does this:

  [0, 1+0, 1+1, 2+2, 4+1, 0+5, 1+5]
  [0, 1, 2, 4, 5, 5, 6]

BUT WHY ??

Can't I just use my array like the one before ? Here is my idea and my code, please explain why it's wrong or, why other way is more useful.

void countingSort (int *arr) {

    int countingArray[MAX_NUM] = {0};

    for (i = 0 ; i < ARRAY_SIZE ; i++)
        countingArray[arr[i]]++;

    int output_Index = 0;

    for (i = 0 ; i < MAX_NUM ; i++)
        while ( countingArray[i]-- )
            arr[output_Index++] = i;
}

2 Answers2

3

For the simple case where you are sorting an array of integers, your code is simpler and better.

However, counting sort is a general sorting algorithm that can sort based on a sorting key derived from the items to be sorted, which is used to compare them, as opposed to directly comparing the items themselves. In the case of an array of integers, the items and the sort keys can be one and the same, you just compare them directly.

It looks to me as though the geeksforgeeks code has been adapted from a more generic example that allows the use of sorting keys, something like this:

// Store count of each item
for(i = 0; arr[i]; ++i)
    ++count[key(arr[i])];

// Change count[i] so that count[i] now contains actual
// position of this character in output array
for (i = 1; i <= RANGE; ++i)
    count[i] += count[i-1];

// Build the output array
for (i = 0; arr[i]; ++i)
{
    output[count[key(arr[i])]-1] = arr[i];
    --count[key(arr[i])];
}

Where key is a function that computes a sort key based on an item (for an integer type you could just return the integer itself). In this case MAX_NUM would have to be replaced with MAX_KEY.

This approach uses the extra output array because the final result is generated by copying the items from arr rather than simply from the information in count (which only contains the count of items with each key). However, an in-place counting sort is possible.

The algorithm also guarantees a stable sort (items with the same sort key have their relative order preserved by sorting) - this is meaningless when sorting integers.

However, since they have removed the ability to sort based on key, there's no reason for the extra complexity and your way is better.

It's also possible that they have copied the code from a language like C++, where the int cast (which will be called when using an item to index an array) could be overloaded to return the sort key, but have mistakenly converted to C.

Community
  • 1
  • 1
samgak
  • 23,944
  • 4
  • 60
  • 82
1

I think your version is a better approach. I suspect that the person who wrote this code sample had probably written similar code samples for other sorting algorithms — there are many sorting algorithms where you do need separate "scratch space" — and didn't put enough thought into this one.

Alternatively, (s)he may have felt that the algorithm is easier to explain if we separate "generating the result" from "moving the result into place"? I don't agree, if so, but the detailed comments make clear that (s)he had pedagogy in mind.

That said, there are a few minor issues with your version:

  • You forgot to declare i.
  • You should take the array-length as a parameter, rather than using a hardcoded ARRAY_SIZE. (In the code sample, this issue is avoided by using a string, so they could iterate until the terminating null byte.)
  • This may be subjective, but rather than while ( countingArray[i]-- ), I think it's clearer to write for (int j = 0; j < countingArray[i]; ++j).
ruakh
  • 175,680
  • 26
  • 273
  • 307
  • More subjective, `memset`? – Mooing Duck Feb 01 '17 at 18:28
  • I like the answer. Still tho, my code was for a competition so I have generally defined variables, like MAX_NUM is actually in main function and i is also defined generally, I don't like to put too much parameters in a function if it is not necessary. –  Feb 01 '17 at 18:28
  • @MooingDuck What about memset ? –  Feb 01 '17 at 18:29
  • @BedirTapkan: `while ( countingArray[i]-- ) arr[output_Index++] = i;` appears replacable with a single `memset` call. – Mooing Duck Feb 01 '17 at 19:16
  • @MooingDuck Maan :D I never used memset like that, nice! Thanks! –  Feb 01 '17 at 19:38
  • @MooingDuck: The OP's function, unlike the geeksforgeeks version, is sorting an `int` array; I don't think `memset` would work for that. – ruakh Feb 01 '17 at 21:35
  • C++'s `std::fill` works fine, but you're right that I can't find anything equivalent for C. He'd have to make a custom variant. – Mooing Duck Feb 01 '17 at 21:58