0

I am supposed to write an algorithm in C that takes an array of integers in any base (radix) and use Radix Sort to put them in ascending order. My algorithm can not impose any limit for the length of the array, the number of digits and the base. When I say integers I do not mean I will always receive an array of ints, it can be an array of long ints, chars, char pointers, etc.

Seen that, I though the best way to deal with that imposing as little as possible limitations to my algorithm was to just deal with the vector by accessing it's positions and type casting the value there to long unsigned int so the limitation this approach has is that the user must use a digit "weight" that complies with the encoding option the code was compiled with, i.e., in the ASCII encoding x in decimal base is 120 and a is 97, so if the given base uses the digits x and a, x must be greater than a.

Here is my algorithm:

/*!
 *  @function   radixSort
 *  @abstract   Sort a vector using Radix Sort.
 *  @discussion This function take a vector of positive
 *              integers in any base (radix) and sorts
 *              it in ascending order using the Radix
 *              Sort algorithm.
 *  @param      v       the vector with elements to sort
 *  @param      n       the length of the vector v
 *  @param      getMax  the function to get the maximum element of v
 */
void radixSort(void *v[], long unsigned int n, long unsigned int (*getMax)(void*[], long unsigned int)) {
    long unsigned int i;
    void* semiSorted[n];
    long unsigned int significantDigit = 1;
    long unsigned int max = (*getMax)(v, n);
    while (max / significantDigit > 0){
        long unsigned int intermediate[10] = {0};
        for(i=0; i<n; i++)
            intermediate[(long unsigned int)v[i]/significantDigit % 10]++;
        for(i=1; i<10; i++)
            intermediate[i] += intermediate[i-1];
        for(i=n; i>0; i--)
            semiSorted[--intermediate[(long unsigned int)v[i-1]/significantDigit % 10]] = v[i-1];
        for(i=0; i<n; i++)
            v[i] = semiSorted[i];
        significantDigit *= 10;
    }
}

This algorithm works with all types I tested, but a string array. If I have something like:

int main() {
    char *b36v[7] = {"014", "000", "8f6", "080", "056", "00a", "080"};
    printf("b36v unsorted: "); for(i=0; i<7; i++) printf("%s ", b36v[i]); printf("\n");
    radixSort(b36v, 7, getMaxFunc);
    printf("b36v sorted  : "); for(i=0; i<7; i++) printf("%s ", b36v[i]); printf("\n");
    return 0;
}

The result output will be:

b36v unsorted: 014 000 8f6 080 056 00a 080

b36v sorted : 014 000 8f6 080 080 056 00a

So it doesn't work as expected, I expected the sorted vector to be:

000 00a 014 056 080 080 8f6

I can only see that equal entries are grouped together as expected. Changing the values and observing the outputs I think probably the algorithm is taking the memory addresses and sorting them because at a given position of the original vector there is no actual value, but an array of char's.

I just want to know if there is someway I can handle these cases of 2D (or even 3D, 4D, ...) arrays writing an universal code which handle from int/char arrays to arrays of arrays of arrays... of long unsigned ints and without asking for too much parameters. I can't figure out how to manage this situation in C.

Community
  • 1
  • 1
Rodrigo Oliveira
  • 1,452
  • 4
  • 19
  • 36
  • @EmilianoSorbello Traditionally yes. My algorithm assumes that, but think about a base `k` with `k > 62` you will start using symbols as digits and theoretically nothing tells that `]` is greater than `;`, that's why I assumed user follows an encoding table order. – Rodrigo Oliveira Jun 08 '15 at 22:15
  • 1
    It seems to me that the array of strings is the _only_ case you deal with where the input data can be said to have any meaningful "radix" at all. In all the other cases you are just dealing with the plain binary integer values that are stored in the array. When you come to the case of the strings, you end up sorting the pointers themselves in increasing size. The result has nothing to do with whatever data the pointers are pointing _to_. – David K Jun 08 '15 at 22:39
  • 1
    As pointed out by David K, integers are stored as binary. There's no "base", but the radix sort can sort the integers by bit fields of varying sizes. A common method for radix sort is to sort integers using 8 bit fields (least significant field first, most significant field last), so it takes 2 passes for 16 bit integers, 4 passes for 32 bit integers, and 8 passes for 64 bit integers. For signed integers, complement the sign bit of the most significant field before using it as an index, or negatively offset the array indices for the most significant field to compensate for the sign bit. – rcgldr Jun 08 '15 at 23:37
  • 1
    If the integers contain BCD (binary coded decimal), the only restriction is that the bit fields need to be a multiple of 4 bits in size. Otherwise, you can treat them the same as integers. – rcgldr Jun 08 '15 at 23:40
  • 1
    The main issue is sorting an array of strings or array of pointers to strings which I assume are fixed length in size (otherwise you have to treat shorter strings as if they have leading zeroes). This would use the same algorithm, but a different function, to deal with the strings. Instead of bit fields, you would sort by least significant byte to most significant byte (sorting by 1 byte per pass). If some order other than ASCII is wanted, you'd need to use a 256 byte conversion table to map the bytes for indexing. – rcgldr Jun 08 '15 at 23:41
  • @DavidK Yes, you are correct, in the end everything is stored as binary integer values. I can see that what I am ordering with a strings array is the addresses from the pointers of the char arrays. The point where I get stuck is here, how to identify I have an array of pointers to actually sort the content of the pointers an not their addresses without using a different function and not asking for a parameter to tell if I have a 1D, 2D, ..., nD array. – Rodrigo Oliveira Jun 09 '15 at 00:13
  • @rcgldr Some clarifications first: Of course it would be good, but for now I am not seeking for unsigned nor BCD `int`s support. About the string length for now I am just trying to make it work for fixed length, but I am supposed to support variable length strings, but that I can take care later (I hope so). Other encondings than ASCII would be good to but I will focus for ASCII support for now. – Rodrigo Oliveira Jun 09 '15 at 00:20
  • @rcgldr You said I would need a different function for sorting string arrays, is that the only way? I am sure I can do another function for that cases, but my main question here is: can it be done with a single function? I am allowed to use C++ for this algorithm, which I think could solve the problem, maybe using overloading. But I prefer doing it in C. – Rodrigo Oliveira Jun 09 '15 at 00:24
  • 1
    @RodrigoMartins - A single "generic" function could take an array of unsigned items (signed effectively requires flipping the sign bit), but treat the array as a matrix, with the number of columns being the number of bytes per unsigned item (like 4 for an integer, or perhaps 10 for a fixed length ASCII string), and a flag to indicate if the items are little endian (integers) or big endian (strings) (radix sort sorts from least significant byte to most significant byte). – rcgldr Jun 09 '15 at 01:39

0 Answers0