6

I'm hoping this isn't a duplicate question, but I've searched in some detail and haven't found my exact case before.

I have a simple struct that I also want to be able to access as a simple byte array

union
{
  struct
  {
    unsigned char a;
    unsigned char b;
    // ... Some other members ...
    unsigned char w;
  };
  unsigned char bytes[sizeof( what? )];
} myUnion;

Notice the struct is not named and it also isn't given its own member name. This is so that I can use myUnion.a to access that member, and not myUnion.myStruct.a.

However, without some name, how can I get the size of the struct for myUnion.bytes[] other than manually calculating it each time I change something?

My current workaround is to use a #define to make up for the myUnion.myStruct problem, but that has the negative side-effect of ruining my auto-complete in the editor, and also makes my data structures harder to understand.

Any ideas?

Note: This is running on an 8-bit processor. There are no issues with word alignment and such. That said, any caveats should probably be stated so someone else doesn't use a proposed solution inappropriately.

timrau
  • 22,578
  • 4
  • 51
  • 64
gkimsey
  • 517
  • 6
  • 13
  • 2
    Use offsetof(). http://www.cplusplus.com/reference/cstddef/offsetof/. Can get any wrong result imo. – this May 07 '14 at 18:56
  • 1
    this is not how union is supposed to be used and is very dangerous. If you are lucky, the compiler won't add any padding to your struct and it'll work. – Red Alert May 07 '14 at 18:57
  • 1
    @RedAlert you can "ask the compiler" not at add padding between the elements but whenever you see something like this you should assume that it isn't going to be portable. – Grady Player May 07 '14 at 19:00
  • Side remark: Anonymous structs inside unions/other classes are not allowed in standard C++. They're not even allowed in C99, finally C11 has added (official) support for them. g++ and clang++ support them as a language extension. – dyp May 07 '14 at 20:15
  • 1
    I think it would be a lot simpler to just have a normal struct; and then access it as `bytes` via a cast or a function call. The fewer non-standard constructs and hacks you use, the fewer headaches you will have down the track – M.M May 08 '14 at 02:04

3 Answers3

7

Just get rid of the union. You can safely access any trivially-copyable structure as a byte array by casting its address to char*, and casting won't run afoul of the undefined behavior when you read from an inactive union member.

struct
{
    unsigned char a;
    unsigned char b;
    // ... Some other members ...
    unsigned char w;

    // array-style access
    unsigned char& operator[](size_t i)
    { return reinterpret_cast<unsigned char*>(this)[i]; }
} myStruct;

The reason that it's safe to cast in this manner is that char is a special exception from the strict aliasing restrictions.

For unions, the only special permission you get is for access to members which are "standard-layout structs which share a common initial sequence"... and an array unfortunately does not meet the criteria for a "standard-layout struct". I would like to see that rule change to "standard-layout struct or aggregate", but in the current wording the union version isn't safe.


In C99, but not any version of C++, you could use a flexible array member, and not need to specify the size at all.

union
{
  struct
  {
    unsigned char a;
    unsigned char b;
    // ... Some other members ...
    unsigned char w;
  };
  unsigned char bytes[];
} myUnion;

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • yeah that seems more sane, ... or even just use a byte array – Grady Player May 07 '14 at 19:05
  • Looks like I'm not using C99, as I get the error "flexible array member in union" when trying the second solution. The first apears to require C++ for the syntactic sugar, but the basic premise should work for me using `#define myStructBytes ((char *)myStruct)` or some such. Thanks! – gkimsey May 07 '14 at 20:45
  • @gkimsey: I didn't test it, so possibly this is one of the places in C99 you can't you a flexible array member. The fact that the compiler recognized it as that is pretty suggestive. TonyK's version using a size of `1` is a reasonable workaround and probably works on most compilers, but I think it does break the rules. – Ben Voigt May 07 '14 at 22:15
  • n1570 6.7.2.1/18 "As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a *flexible array member*." So it seems indeed to be illegal. (Edit: ... saw the edit too late) – dyp May 07 '14 at 23:47
1

This will work:

union
{
  struct
  {
    unsigned char a;
    unsigned char b;
    // ... Some other members ...
    unsigned char w;
  };
  unsigned char bytes[1];
} myUnion;
TonyK
  • 16,761
  • 4
  • 37
  • 72
  • I had considered this, and assumed that then if I did `myUnion.bytes[2]` somewhere in my code the compiler would complain, but I just tried it and it doesn't. I think I'll go with this, although it may not be very portable. – gkimsey May 07 '14 at 20:48
  • Here's an ideone using this solution if it's of benefit to anyone. http://ideone.com/MDnXC6 – gkimsey May 07 '14 at 21:28
  • @gkimsey: It's 100% portable. The expression `myUnion.bytes[2]` is the same as `*(myUnion.bytes+2)`, which is unambiguous. – TonyK May 08 '14 at 07:22
0

You won't get around nameing the former anonymous structure.

alk
  • 69,737
  • 10
  • 105
  • 255