16

What would be the differences between using simply a void* as opposed to a union? Example:

struct my_struct {
    short datatype;
    void *data;
}

struct my_struct {
    short datatype;
    union {
        char* c;
        int* i;
        long* l;
    };
};

Both of those can be used to accomplish the exact same thing, is it better to use the union or the void* though?

timrau
  • 22,578
  • 4
  • 51
  • 64
user105033
  • 18,800
  • 19
  • 58
  • 69
  • 5
    Pardon the cliche, but size matters :) – Tim Post Nov 30 '09 at 18:35
  • 2
    @tinkertim and odds are that they'll be the same size. On a 32 bit intel system, both will take 4 bytes. technically, void * could be larger, but it almost never will be (unless double * is larger...) – Mikeage Nov 30 '09 at 18:50
  • 1
    Aren't all pointers the same size regardless of the type of the data they point to? – Mathieu Pagé Mar 01 '16 at 12:33

11 Answers11

19

I had exactly this case in our library. We had a generic string mapping module that could use different sizes for the index, 8, 16 or 32 bit (for historic reasons). So the code was full of code like this:

if(map->idxSiz == 1) 
   return ((BYTE *)map->idx)[Pos] = ...whatever
else
   if(map->idxSiz == 2) 
     return ((WORD *)map->idx)[Pos] = ...whatever
   else
     return ((LONG *)map->idx)[Pos] = ...whatever

There were 100 lines like that. As a first step, I changed it to a union and I found it to be more readable.

switch(map->idxSiz) {
  case 1: return map->idx.u8[Pos] = ...whatever
  case 2: return map->idx.u16[Pos] = ...whatever
  case 3: return map->idx.u32[Pos] = ...whatever
}

This allowed me to see more clearly what was going on. I could then decide to completely remove the idxSiz variants using only 32-bit indexes. But this was only possible once the code got more readable.

PS: That was only a minor part of our project which is about several 100’000 lines of code written by people who do not exist any more. The changes to the code have to be gradual, in order not to break the applications.

Conclusion: Even if people are less used to the union variant, I prefer it because it can make the code much lighter to read. On big projects, readability is extremely important, even if it is just you yourself, who will read the code later.

Edit: Added the comment, as comments do not format code:

The change to switch came before (this is now the real code as it was)

switch(this->IdxSiz) { 
  case 2: ((uint16_t*)this->iSort)[Pos-1] = (uint16_t)this->header.nUz; break; 
  case 4: ((uint32_t*)this->iSort)[Pos-1] = this->header.nUz; break; 
}

was changed to

switch(this->IdxSiz) { 
  case 2: this->iSort.u16[Pos-1] = this->header.nUz; break; 
  case 4: this->iSort.u32[Pos-1] = this->header.nUz; break; 
}

I shouldn't have combined all the beautification I did in the code and only show that step. But I posted my answer from home where I had no access to the code.

Patrick Schlüter
  • 11,394
  • 1
  • 43
  • 48
  • 7
    I think the readability comes from using a switch instead of nested ifs. – swegi Dec 01 '09 at 05:58
  • 1
    The change to switch came before (this is now the real code as it was) switch(this->IdxSiz) { case 2: ((uint16_t*)this->iSort)[Pos-1] = (uint16_t)this->header.nUz; break; case 4: ((uint32_t*)this->iSort)[Pos-1] = this->header.nUz; break; } was changed to switch(this->IdxSiz) { case 2: this->iSort.u16[Pos-1] = this->header.nUz; break; case 4: this->iSort.u32[Pos-1] = this->header.nUz; break; } I shouldn't have combined all the beautification I did in the code and only show that step. But I posted my answer from home where I had no access to the code. – Patrick Schlüter Dec 01 '09 at 11:16
12

In my opinion, the void pointer and explicit casting is the better way, because it is obvious for every seasoned C programmer what the intent is.

Edit to clarify: If I see the said union in a program, I would ask myself if the author wanted to restrict the types of the stored data. Perhaps some sanity checks are performed which make sense only on integral number types. But if I see a void pointer, I directly know that the author designed the data structure to hold arbitrary data. Thus I can use it for newly introduced structure types, too. Note that it could be that I cannot change the original code, e.g. if it is part of a 3rd party library.

swegi
  • 4,046
  • 1
  • 26
  • 45
  • intent and use are two different things when it comes to memory consumption. A union is as big as its biggest member. – Tim Post Nov 30 '09 at 18:30
  • but in this casee.. all the members in union are pointers.. So it should be same size as a void * right? – FatDaemon Nov 30 '09 at 18:38
  • 1
    nothing in the spec requires pointers to be the same size, but void * has to be large enough to handle any other pointer. That said, in practice, pointers are generally the same size – Mikeage Nov 30 '09 at 18:51
  • 3
    Just curious, how is the intent any less clear in the union version? – Dan Olson Nov 30 '09 at 20:08
  • @Dan Olson -- because C, historically, always used a void * for this. THat's what they're intended for. – Mikeage Dec 01 '09 at 08:26
  • 1
    Wouldn't you say that this usage also covers the intent of unions? – Dan Olson Dec 04 '09 at 01:25
  • @Dan: not with regard to indirection or dynamically allocated buffers. While you can use a union to accomodate all possible data types, ADT implementations in C always use void*, because it's not desirable to restrict it to hold only certain data types. – Michael Foukarakis Dec 16 '09 at 08:36
8

It's more common to use a union to hold actual objects rather than pointers.

I think most C developers that I respect would not bother to union different pointers together; if a general-purpose pointer is needed, just using void * certainly is "the C way". The language sacrifices a lot of safety in order to allow you to deliberately alias the types of things; considering what we have paid for this feature we might as well use it when it simplifies the code. That's why the escapes from strict typing have always been there.

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
6

The union approach requires that you know a priori all the types that might be used. The void * approach allows storing data types that might not even exist when the code in question is written (though doing much with such an unknown data type can be tricky, such as requiring passing a pointer to a function to be invoked on that data instead of being able to process it directly).

Edit: Since there seems to be some misunderstanding about how to use an unknown data type: in most cases, you provide some sort of "registration" function. In a typical case, you pass in pointers to functions that can carry out all the operations you need on an item being stored. It generates and returns a new index to be used for the value that identifies the type. Then when you want to store an object of that type, you set its identifier to the value you got back from the registration, and when the code that works with the objects needs to do something with that object, it invokes the appropriate function via the pointer you passed in. In a typical case, those pointers to functions will be in a struct, and it'll simply store (pointers to) those structs in an array. The identifier value it returns from registration is just the index into the array of those structs where it has stored this particular one.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
2

Although using union is not common nowadays, since union is more definitive for your usage scenario, suits well. In the first code sample it's not understood the content of data.

Emre Yazici
  • 10,136
  • 6
  • 48
  • 55
2

My preference would be to go the union route. The cast from void* is a blunt instrument and accessing the datum through a properly typed pointer gives a bit of extra safety.

Tim Allman
  • 1,501
  • 11
  • 8
  • it provides no protection; if you store a char * and access it as int *, the compiler will be perfectly happy, although if it's not aligned, you will have problems... – Mikeage Nov 30 '09 at 18:52
  • Note that I said a "bit" of safety. The truth is that C will let you make any mistake you want. – Tim Allman Nov 30 '09 at 19:08
  • which bit is that? forcing you to only cast int * to char * or vice versa? – Mikeage Nov 30 '09 at 19:23
  • For me, safety in programming is not just in using the mechanisms that the compiler provides but extends to knowing the way I think and how I make mistakes. For me, the union better, maybe. – Tim Allman Dec 01 '09 at 12:59
2

Toss a coin. Union is more commonly used with non-pointer types, so it looks a bit odd here. However the explicit type specification it provides is decent implicit documentation. void* would be fine so long as you always know you're only going to access pointers. Don't start putting integers in there and relying on sizeof(void*) == sizeof (int).

I don't feel like either way has any advantage over the other in the end.

Dan Olson
  • 22,849
  • 4
  • 42
  • 56
2

It's a bit obscured in your example, because you're using pointers and hence indirection. But union certainly does have its advantages.

Imagine:

struct my_struct {
   short datatype;
   union {
       char c;
       int i;
       long l;
   };
};

Now you don't have to worry about where the allocation for the value part comes from. No separate malloc() or anything like that. And you might find that accesses to ->c, ->i, and ->l are a bit faster. (Though this might only make a difference if there are lots of these accesses.)

asveikau
  • 39,039
  • 2
  • 53
  • 68
  • @Mikeage - Then, if the array has a fixed maximum size and that size is acceptable to put into your structure, I'd still be tempted to use a union, not containing pointers but containing arrays. Otherwise I'd use void pointers. Since the example did not include a size member I assume that wasn't what he was thinking, though. – asveikau Nov 30 '09 at 19:52
2

It really depends on the problem you're trying to solve. Without that context it's really impossible to evaluate which would be better.

For example, if you're trying to build a generic container like a list or a queue that can handle arbitrary data types, then the void pointer approach is preferable. OTOH, if you're limiting yourself to a small set of primitive data types, then the union approach can save you some time and effort.

John Bode
  • 119,563
  • 19
  • 122
  • 198
2

If you build your code with -fstrict-aliasing (gcc) or similar options on other compilers, then you have to be very careful with how you do your casting. You can cast a pointer as much as you want, but when you dereference it, the pointer type that you use for the dereference must match the original type (with some exceptions). You can't for example do something like:

void foo(void * p)
{
   short * pSubSetOfInt = (short *)p ;
   *pSubSetOfInt = 0xFFFF ;
}

void goo()
{
   int intValue = 0 ;

   foo( &intValue ) ;

   printf( "0x%X\n", intValue ) ;
}

Don't be suprised if this prints 0 (say) instead of 0xFFFF or 0xFFFF0000 as you may expect when building with optimization. One way to make this code work is to do the same thing using a union, and the code will probably be easier to understand too.

Peeter Joot
  • 7,848
  • 7
  • 48
  • 82
1

The union reservs enough space for the largest member, they don't have to be same, as void* has a fixed size, whereas the union can be used for arbitrary size.

#include <stdio.h>
#include <stdlib.h>

struct m1 {
   union {
    char c[100];
   };
};

struct m2 {
    void * c;
 };


 int
 main()
 {
printf("sizeof m1 is %d ",sizeof(struct m1));
printf("sizeof m2 is %d",sizeof(struct m2));
exit(EXIT_SUCCESS);
 }

Output: sizeof m1 is 100 sizeof m2 is 4

EDIT: assuming you only use pointers of the same size as void* , I think the union is better, as you will gain a bit of error detection when trying to set .c with an integer pointer, etc'. void* , unless you're creating you're own allocator, is definitely quick and dirty, for better or for worse.

Liran Orevi
  • 4,755
  • 7
  • 47
  • 64