0

In plain C, pointers to void are useful as arguments to generic functions, such as a generic quicksort, or a generic swap, etc., as seen here:

Implementation of generic quicksort, etc.

However, as the example of the generic partition function at that link shows:

size_t generic_partition(void *v, size_t nelems, size_t size, compare_function comp)
{
    // for brevity we'll be using the right-most 'element' for the 
    //  location of the pivot storage. a better choice would use
    //  a median-of-three or median-of-five for pivot selection,
    //  then swap it to that location before the partitioning loop.
    
    void *ploc = (unsigned char*)v + (nelems-1) * size;
    size_t pvt = 0;

    for (size_t i=0; i<nelems; ++i)
    {
        if (comp((unsigned char*)v + (i*size), ploc) < 0)
            generic_swap((unsigned char*)v + (pvt++*size), (unsigned char*)v + (i*size), size);
    }

    generic_swap((unsigned char*)v + (pvt*size), ploc, size);

    return pvt;
}

it can be quite heavy on the pointer arithmetic and type casting, inside a loop. The reason seems to be that we must frequently cast between void* and char*, because char* supports pointer arithmetic, while void* does not (because the element size is unknown).

So why don't we use char* instead of void*, to implement generic arguments to generic functions ? It seems to have the advantages of void* (generality, since doesn't every type occupy some integer multiple of one byte ?) with the additional advantage of supporting pointer arithmetic.

Using pointers to char only, without pointers to void, wouldn't the loop in the above code require fewer additions, no multiplications, and no type casts ? If so, then wouldn't that be more efficient ?

Perhaps I am missing something to do with functions like "comp" automatically converting the arguments passed to them into the expected type ?

Perhaps I am missing the point generally.

EDIT (To include code for generic_swap):

void generic_swap(void *a, void *b, size_t size)
{
    if (a == b)
        return;

    unsigned char *p1 = a, *p2 = b;
    while (size-- > 0)
    {
        unsigned char c = *p1;
        *p1++ = *p2;
        *p2++ = c;
    }
}

EDIT - This is what I meant by avoiding multiplications (for what it's worth). I've written it as a modification of dbush's nice answer:

size_t generic_partition(void *v, size_t nelems, size_t size, compare_function comp)
{
    // for brevity we'll be using the right-most 'element' for the 
    //  location of the pivot storage. a better choice would use
    //  a median-of-three or median-of-five for pivot selection,
    //  then swap it to that location before the partitioning loop.
    
    unsigned char *pboundary = v;
    unsigned char *ploc = pboundary + (nelems-1) * size;
    size_t pvt = 0;

    unsigned char *pcurrent = pboundary + size;

    while (pcurrent < ploc)
    {
        if (comp(pcurrent, ploc) < 0)
        {
            generic_swap(pboundary, pcurrent, size);
            pboundary = pboundary + size;
            pvt++;
        }
        pcurrent = pcurrent + size;
    }

    generic_swap(pboundary, ploc, size);

    return pvt;
}

EDIT: Yet more clarification re. arithmetic:

In case the comparison pcurrent < ploc in the while condition is not defined, maybe it could be replaced by pcurrent != ploc.

Also, I know that there are edge cases that may fail, for example if the array has only one element, but I assume that I would be careful about those in real life, and here my question is more about void*, char* and pointer arithmetic.

EDIT: (The fourth !) I'd happily revert the "while" loop to a "for" loop as in the original, at the cost of an extra variable (the iterator) in return for avoiding possibly tricky behaviour when comparing the pointers via < or via !=.

EDIT: (The fifth) Hold up, wait a minute - something ain't right ! I've just gone back to the book "The C programming language" by Kernighan and Ritchie, where I've found (a variant of) this generic partitioning code included in a generic implementation of quicksort (which is what I am actually trying to implement). To my surprise, although it seems to be perfectly generic, yet it seems to make no use of pointer arithmetic, and no use of casting to chars or unsigned chars ! Am I missing something, or have the many authors of courses and websites that I've consulted been greatly overcomplicating things ? Here is the code from Kernighan and Ritchie's book:

void qsort(void *v[], int left, int right, int (*comp)(void *, void *))
{
    int i, last;
    void swap(void *v[], int, int);

    if (left >= right)
        return;
    swap(v, left, (left + right)/2);
    last = left;
    for (i = left + 1; i <= right; i++)
        if ((*comp)(v[i], v[left]) < 0)
            swap(v, ++last, i);
    swap(v, left, last);
    qsort(v, left, last-1, comp);
    qsort(v, last+1, right, comp);
}

EDIT (The sixth !!!) My apologies all, I am evidently even more of a noob than I had realised: My source reliably informs me that this Kernighan-Ritchie "generic" quicksort operates on an array of pointers always, and that the generic-ness must therefore be aided by the programmer making sure to feed it an array of pointers into the actual array they are trying to sort. I can see that the first code I posted, with its casting to char* etc., achieves generic-ness in a different way.

Simon
  • 105
  • 1
  • 5
  • 1
    How do you figure there would be fewer additions and no multiplications? The code still needs to do the arithmetic. – Eric Postpischil Apr 28 '22 at 14:08
  • 3
    Type casts are (generally) free. Pointers to `void` are used as generic pointers to objects, with implicit conversions to and from pointers to specific types. Pointers to `char` require an explicit cast. If the explicit casts everywhere cause code to be harder to read, why not add a variable of type `unsigned char *` and just do the (implicit) cast once? – Thomas Jager Apr 28 '22 at 14:09
  • 1
    That code could be simplified by using a `typedef` to make a generic “object of raw bytes” type: `typedef unsigned char S[size]; S *vs = v; S *ploc = &vs[nelems-1]; size_t pvt = 0; for (size_t i=0; i – Eric Postpischil Apr 28 '22 at 14:14
  • @EricPostpischil Doesn't that violate the type aliasing rules? – dbush Apr 28 '22 at 14:21
  • Thank you very much Eric, Thomas and dbush ! Eric, I think I imagined that rather than multiplying the iterator "i" by "size" each time, I would add "size" to a variable of type size_t each time. I think I hoped that repeated addition would be more efficient than repeated multiplication. – Simon Apr 28 '22 at 14:21
  • @dbush: None of the code there accesses any of the memory pointed to by `v`, so it cannot violate any aliasing rules, since those rules only affect what happens when accessing memory. Presumably, `generic_swap` accesses memory, by using some character type, so that would be fine. – Eric Postpischil Apr 28 '22 at 14:25
  • Using `void*` instead of `char*` as a way of dealing with pointers to incomplete types is a philosophical decision made by the writers of the standard, and arithmetic is undefined on them *for a reason*. If you don't know what you're pointing to, arithmetic on the pointer just doesn't make sense. If you do know what you're pointing to, you should just cast the `void*` to a pointer to that type. Using `void*` to type-pun and try to be sneaky is a good way to write unmaintainable and sometimes non-portable code – Willis Hershey Apr 28 '22 at 14:29
  • 1
    Generally speaking, `void*` can be safely replaced with `unsigned char*` in most places. However, it might lead to explicit casts on the caller side instead. – Lundin Apr 28 '22 at 14:33
  • @Eric Postpischil, please see the code attempt that I posted in the edit to my answer headed "EDIT - This is what I meant by avoiding multiplications (for what it's worth)". Here I have tried to reduce the number of multiplications and additions. However, even if it worked, I do not know whether the output from the compiler would actually be more efficient assembly code, or whether the compiler would produce the most efficient version in every case, anyway. – Simon Apr 28 '22 at 17:40
  • @WillisHershey: It's too bad no operators were ever defined to perform byte-based pointer arithmetic in type-agnostic fashion. On some platforms, code using byte-based pointer arithmetic can be processed more efficiently than code using pointer-target-type-based arithmetic, but the syntax is incredibly annoying. – supercat Apr 28 '22 at 20:29

1 Answers1

2

Conversions to and from void * can occur without a cast and are guaranteed to work for any object type.

You can actually eliminate the casting by assigning the void * argument to a unsigned char * once, then using that in the rest of the function:

size_t generic_partition(void *v, size_t nelems, size_t size, compare_function comp)
{
    // for brevity we'll be using the right-most 'element' for the 
    //  location of the pivot storage. a better choice would use
    //  a median-of-three or median-of-five for pivot selection,
    //  then swap it to that location before the partitioning loop.
    
    unsigned char *pstart = v;
    unsigned char *ploc = pstart + (nelems-1) * size;
    size_t pvt = 0;

    for (size_t i=0; i<nelems; ++i)
    {
        if (comp(pstart  + (i*size), ploc) < 0)
            generic_swap(pstart + (pvt++*size), pstart + (i*size), size);
    }

    generic_swap(pstart + (pvt*size), ploc, size);

    return pvt;
}

Note that you still have to multiply by the element size when constructing the pointers.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • Thank you very much, dbush ! If you get a chance, please could you have a look at my second edit to my question, where I try to explain what I was thinking about arithmetic, and let me know if it seems ok ? – Simon Apr 28 '22 at 16:18