6

I am making C dynamic array library, kind of. Note that I'm doing it for fun in my free time, so please do not recommend million of existing libraries.

I started implementing sorting. The array is of arbitrary element size, defined as struct:

typedef struct {
  //[PRIVATE] Pointer to array data
  void *array;
  //[READONLY] How many elements are in array
  size_t length;
  //[PRIVATE] How many elements can further fit in array (allocated memory)
  size_t size;
  //[PRIVATE] Bytes per element
  size_t elm_size;
} Array;

I originally prepared this to start with the sort function:

/** sorts the array using provided comparator method
 * if metod not provided, memcmp is used
 * Comparator signature
 *  int my_comparator ( const void * ptr1, const void * ptr2, size_t type_size );
**/
void array_sort(Array* a, int(*comparator)(const void*, const void*, size_t)) {
    if(comparator == NULL)
        comparator = &memcmp;
    // Sorting algorithm should follow
}

However I learned about qsort:

void qsort (void* base, size_t num, size_t size, int (*compar)(const void*,const void*));

Apparently, I could just pass my internal array to qsort. I could just call that:

qsort (a->array, a->length, a->elm_size, comparator_callback);

But there's a catch - qsort's comparator signature reads as:

int (*compar)(const void*,const void*)

While memcmp's signature is:

int memcmp ( const void * ptr1, const void * ptr2, size_t type_size );

The element size is missing in qsort's callback, meaning I can no longer have a generic comparator function when NULL is passed as callback. I could manually generate comparators up to X bytes of element size, but that sounds ugly.

Can I use qsort (or other sorting built-in) along with memcpy? Or do I have to chose between built-in comparator and built-in sorting function?

Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778
  • "*so please do not recommend million of existing libraries.*" I laughed – Ryan Haining Nov 28 '16 at 23:38
  • The pointers passed to the compare function are going to be Array pointers. You can cast them to Array then use the length member of that structure to determine how many bytes to compare. – JJF Nov 28 '16 at 23:39
  • Isn't the element size from `qsort` meant to be your array's `elm_size`? – Jongware Nov 28 '16 at 23:40
  • "But there's a catch - qsort's comparator signature reads as:..." that is because you pass the comparison function to the last parameter, not an array. – Weather Vane Nov 28 '16 at 23:40
  • 2
    @RadLexus Yeah sure... Maybe it was unclear what I ask. My problem is that I cannot pass that size to memcpy. The default comparator function needs to know what is an element size - exactly because it receives two array pointers and doesn't know their size. – Tomáš Zato Nov 28 '16 at 23:42
  • The comparator it is not responsible for swapping anything. It knows their type because you wrote the comparison function for the specific use. The swapping is done by `qsort`. It is `qsort` that does not know their type (only the size), hence the `void*` arguments. There is no general implementation of the comparator (which is why it is an argument) - you might need to sort on a variety of `struct` fields, if the first choices are equal. – Weather Vane Nov 28 '16 at 23:53
  • @WeatherVane The comparator needs to know where to stop reading bytes from pointers it received. That's why `memcmp` requires third argument - size. `qsort` also requires size of element but doesn't pass it to comparator. That's my problem here. – Tomáš Zato Nov 29 '16 at 00:01
  • It knows their size as well as their type because *you* wrote the function for the specific use - that is the point of `qsort`. Or, you could pass an array of `struct` to `qsort`, one of whose members gives the size of another of its members (say a dynamic array). – Weather Vane Nov 29 '16 at 00:08
  • memcmp as a comparator only makes sense when you have an array of fixed-size character strings. Not enough motivation to shoehorn it into qairt. – n. m. could be an AI Nov 29 '16 at 00:09
  • Does sorting with `memcmp` even make sense? I can't think of a time I've wanted to sort an array of structs by their binary interpretation. Wouldn't it make much more sense to have the comparator passed into sort by the consumer of the function? They consumer could write the comparator with the correct knowledge of how to sort, what the element sizes are, etc. – Alexander Nov 29 '16 at 00:15
  • @AlexanderMomchliov I must admit that I didn't realize how useless is memcpy for anything but chars and fixed char lists. – Tomáš Zato Nov 29 '16 at 00:39
  • @TomášZato Yeeeeep. I think taking the comparator as a parameter to sort would be the best bet. Perhaps offer some default versions for sorting common datatypes like int, char, strings, etc. – Alexander Nov 29 '16 at 00:41

3 Answers3

4

C11 provides you with an (admittedly optional) qsort_s function, which is intended to deal with this specific situation. It allows you to pass-through a user-provided void * value - a context pointer - from the calling code to the comparator function. The comparator callback in this case has the following signature

int (*compar)(const void *x, const void *y, void *context)

In the simplest case you can pass a pointer to the size value as context

#define __STDC_WANT_LIB_EXT1__ 1
#include <stdlib.h>
...

int comparator_callback(const void *x, const void *y, void *context)
{
  size_t elm_size = *(const size_t *) context;
  return memcmp(x, y, elm_size);
}

...
qsort_s(a->array, a->length, a->elm_size, comparator_callback, &a->elm_size);

Or it might make sense to pass a pointer to your entire array object as context.

Some *nix-based implementations have been providing a similar qsort_r function for a while, although it is non-standard.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • I'm using the library in homework assignments which require C99, but I still upvoted this because it's useful in general. – Tomáš Zato Nov 29 '16 at 00:02
1

A non-thread-safe way is use private global variable to pass the size.

static size_t compareSize = 0;

int defaultComparator(const void *p1, const void *p2) {
  return memcmp(p1, p2, compareSize);
}

void array_sort(Array* a, int(*comparator)(const void*, const void*, size_t)) {
    if(comparator == NULL) {
      compareSize = a->elm_size;
      comparator = &defaultComparator;
    }
    // Sorting algorithm should follow
}

You can make it thread-safe by make compareSize thread-local variable (__thread)

Bryan Chen
  • 45,816
  • 18
  • 112
  • 143
  • I thought of that, but that's a no-no for me, even though I don't think I'll use this with any multithreading. If that was the only option, I'd rather copy paste and edit `qsort` from GNU C library. – Tomáš Zato Nov 28 '16 at 23:59
  • Why? Global variable on C is sometimes unavoidable. By making it internal linkage, I don't see any issue with it. – Bryan Chen Nov 29 '16 at 00:01
  • I'm not so experienced with C, I didn't know that. Could you add some detail on how to use the `__thread` thing? Is it portable? – Tomáš Zato Nov 29 '16 at 00:25
  • `__thread` is an GCC extension and `thread_local` is in C11. You can read more info [here](https://en.wikipedia.org/wiki/Thread-local_storage#C_and_C.2B.2B). – Bryan Chen Nov 29 '16 at 00:49
1

The qsort() API is a legacy of simpler times. There should be an extra "environment" pointer passed unaltered from the qsort() call to each comparison. That would allow you to pass the object size and any other necessary context in a thread safe manner.

But it's not there. @BryanChen's method is the only reasonable one.

The main reason I'm writing this answer is to point out that there are very few cases where memcmp will do something useful. There are not many kinds of objects where comparison by lexicographic order of constituent unsigned chars makes any sense.

Certainly comparing structs that way is dangerous because padding byte values are unspecified. Even the equality part of the comparison can fail. In other words,

struct foo { int i; };

void bar(void) { 
  struct foo a, b;
  a.i = b.i = 0;
  if (memcmp(&a, &b, sizeof a) == 0) printf("equal!");
}

may - by the C standard - print nothing!

Another example: for something as simple as unsigned ints, you'll get different sort orders for big- vs. little-endian storage order.

unsigned a = 0x0102;
unsigned b = 0x0201;
printf("%s", memcmp(&a, &b, sizeof a) < 0 ? "Less!" : "More!");

will print Less or More depending on the machine where it's running.

Indeed the only object type I can imagine that makes sense to compare with memcmp is equal-sized blocks of unsigned bytes. This isn't a very common use case for sorting.

In all, a library that offers memcmp as a comparison function is doomed to be error prone. Someone will try to use it as a substitute for a specialized comparison that's really the only way to obtain the desired result.

Aykhan Hagverdili
  • 28,141
  • 6
  • 41
  • 93
Gene
  • 46,253
  • 4
  • 58
  • 96
  • While it is true that specific ordering might be platform-dependent, it still does not mean that the specific ordering *matters* in OP's context. Maybe the OP only cares about grouping of equivalent elements, not the relative positions of elements. In essentially the same way element ordering inside of C++ `std::unordered_set` might (and will) differ from one platform/implementation to another, which stuill does not limit `std::unordered_set`'s usability for its intended purpose. – AnT stands with Russia Nov 29 '16 at 00:41
  • @AnT He said he's building a dynamic array library. I think it's reasonable to assume the types it contains are arbitrary. Comparing arbitrary types byte-wise lexicographically almost never makes sense. – Gene Nov 29 '16 at 02:15