0

After receiving the following statement in an answer to this question:

...you are trying to overlay value and bits, and stuffing data into one alternative of an union and taking it out of the other is undefined.

I became much more curious as to what is allowed (and what is prudent) in regards to type punning in C99. After taking a look around I found a lot of helpful information in the post Is type-punning through a union unspecified in C99....

There was a lot to take away from both the comments and the answers posted there. For the purpose of clarity (and as a sanity-check) I wanted to create an example based on my understanding of the C99 standard. Below is the example code that I created and, while it functioned as I anticipated, I wanted to be certain that my assertions are correct.

The following code contains my assertions in the comments. This is my understanding of type-punning in C99. Are my comments correct? If not, can you please explain why?

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#define NUM_BYTES   sizeof(uint32_t)
typedef union
{
    uint32_t fourByteValue;
    uint8_t  byteValue[NUM_BYTES];
    struct
    {
        unsigned int firstBitSpecified  :   1;
        unsigned int secondBitSpecified :   1;
        unsigned int thirdBitSpecified  :   1;
        unsigned int fourthBitSpecified :   1;
        unsigned int paddingBits        :   4;
        uint8_t  oneByteStructValue;
        uint16_t twoByteStructValue;
    };
} U;

int main (void)
{
    const char border[] = "==============================\n";
    U myUnion;
    uint8_t byte;
    uint32_t fourBytes;
    uint8_t i;

    myUnion.fourByteValue = 0x00FFFFFF;
    fourBytes = myUnion.fourByteValue;  /* 1. This is not type-punning. */
    printf("No type-punning fourByteValue:\n%s"
           "fourBytes\t= 0x%.4x\n\n", border, fourBytes);


    printf("Type-punning byteValue:\n%s", border);
    for (i = 0; i < NUM_BYTES; i++)
    {
        byte = myUnion.byteValue[i];   /* 2. Type-punning allowed by C99, 
                                             no unspecified values. */
        printf ("byte[%d]\t\t= 0x%.2x\n", i, byte);
    }
    printf("\n");

    myUnion.byteValue[3] = 0xff;
    fourBytes = myUnion.fourByteValue; /* 3. Type-punning allowed by C99 
                                             but all other 'byteValue's
                                             are now unspecified values. */
    printf("Type-punning fourByteValue:\n%s"
           "fourBytes\t= 0x%.4x\n\n", border, fourBytes);

    myUnion.firstBitSpecified = 0;
    myUnion.thirdBitSpecified = 0;
    fourBytes = myUnion.fourByteValue; /* 4. Again, this would be allowed, but 
                                             the bit that was just assigned
                                             a value of 0 is implementation
                                             defined AND all other bits are
                                             unspecified values. */
    printf("Type-punning firstBitSpecified:\n%s"
           "fourBytes\t= 0x%.4x\n\n", border, fourBytes);

    myUnion.fourByteValue = 0x00000001;
    fourBytes = myUnion.firstBitSpecified; /* 5. Type-punning allowed, although
                                                 which bit you get is implementation
                                                 specific. */
    printf("No type-punning, firstBitSpecified:\n%s"
           "fourBytes\t= 0x%.4x\n\n", border, fourBytes);
    fourBytes = myUnion.secondBitSpecified;
    printf("No type-punning, secondBitSpecified:\n%s"
           "fourBytes\t= 0x%.4x\n\n", border, fourBytes);

    return (EXIT_SUCCESS);
}

The above code was compiled with mingw32-gcc.exe -Wall -g -std=c99 on a 64 bit Windows 7 machine. Upon running the code I receive the following output:

No type-punning fourByteValue:
==============================
fourBytes       = 0xffffff

Type-punning byteValue:
==============================
byte[0]         = 0xff
byte[1]         = 0xff
byte[2]         = 0xff
byte[3]         = 0x00

Type-punning fourByteValue:
==============================
fourBytes       = 0xffffffff

Type-punning firstBitSpecified:
==============================
fourBytes       = 0xfffffffa

No type-punning, firstBitSpecified:
==============================
fourBytes       = 0x0001

No type-punning, secondBitSpecified:
==============================
fourBytes       = 0x0000
Community
  • 1
  • 1
embedded_guy
  • 1,939
  • 3
  • 24
  • 39

1 Answers1

3

My reading of the footnote linked in that post is that type-punning through a union is never specified. Going from this, the standard says:

With one exception, if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined.

The footnote doesn't change that. The reason that this is the case is that C makes no guarantees about either (a) the byte order of numeric types, or (b) the ordering in memory of members of a struct, except insofar as the first member must be byte-aligned to the "beginning" of the struct (so that you can do the sort of casting they do in GTK to achieve polymorphism).

The footnote in question addresses this line:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values, but the value of the union object shall not thereby become a trap representation.

and it says this:

78a If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

The "reinterpretation as an object representation in the new type" is implementation defined (because the interpretation of all types, on a byte-by-byte level, is always implementation defined, taking into account things like endianness, etc). The footnote just adds more detail to point out that extra-surprising things might happen when you mess with the type system via unions, including causing a trap representation. Looking here for a definition of "trap representation:

A trap representation is a set of bits which, when interpreted as a value of a specific type, causes undefined behavior. Trap representations are most commonly seen on floating point and pointer values, but in theory, almost any type could have trap representations. An uninitialized object might hold a trap representation. This gives the same behavior as the old rule: access to uninitialized objects produces undefined behavior.

The only guarantees the standard gives about accessing uninitialized data are that the unsigned char type has no trap representations, and that padding has no trap representations.

So, by replacing uint_8 with unsigned char in your post, you can avoid undefined behavior, and end up with implementation-specific behavior. As written now, however, UB is not forbidden by the standard.

This is made explicit in a quote in the post you linked:

Finally, one of the changes from C90 to C99 was to remove any restriction on accessing one member of a union when the last store was to a different one. The rationale was that the behaviour would then depend on the representations of the values.

Underlying representations, are, by definition, never defined by the standard.

Patrick Collins
  • 10,306
  • 5
  • 30
  • 69
  • 1
    Implementation-defined is different from undefined. "Undefined behaviour" means nasal demons. Unspecified or implementation-defined behaviour means that, although the standard does not deign to specify a behaviour, an otherwise correct program is not rendered incorrect because it elicits the unspecified or implementation-defined behaviour. – tmyklebu Jul 03 '14 at 00:33
  • 1
    @tmyklebu Are nasal demons not a permissible implementation of type-punning via a union? – Patrick Collins Jul 03 '14 at 00:37
  • This comment system does not let me simply say "no," so I'll ramble a little bit. – tmyklebu Jul 03 '14 at 00:38
  • @tmyklebu Reading more, I'm still pretty sure that the OP's code contains undefined behavior, although I was unaware of the difference between undefined and unspecified. I've edited to include justification, let me know if there is still an issue. – Patrick Collins Jul 03 '14 at 00:46
  • Type-punning through a union is as specified as the implementation wants it to be. Once the implementation has specified the formats of, say, `int` and `float`, punning the two in a union is OK as long as you never **use** a trap representation of, say, `float` as a `float`. If an `int` has the usual 4-byte representation and a `float` is an IEEE binary32 float with the usual representation, you can freely pun your `float` value via a union to get an `int` value, then divine things about the sign, significand, and exponent of the `float` using the `int`. – tmyklebu Jul 03 '14 at 01:07
  • The parts of the standard I know of related to type-punning are C11 (draft n1570) 6.5.2.3 p.6 and 6.2.6.1 p.4. My conclusion of this is: The former is rather useless in this case (corret me if I'm wrong) and the latter at least guaranties that `memcpy(&fourByteValue, &byteValue, sizeof fourByteValue);` is valid for type-punning. But I'm not sure on this. Maybe related: 6.7.2.1 p.16, second sentence. – mafso Jul 03 '14 at 01:08
  • Regarding "are nasal demons a permissible implementation of type-punning," the answer's no, but only because `unsigned char` has no trap representation. So you can pun a `float` with an `unsigned char[4]`, say, and you get code that elicits no UB if you store a `float` and read it through the `unsigned char[4]`. The standard doesn't say what that code does, but it also doesn't give the compiler nasal demons licence. An implementation could hypothetically define `int` and `float` to have two completely disjoint sets of non-trap representations and be useless but conformant. – tmyklebu Jul 03 '14 at 01:09
  • 1
    @tmyklebu Does `uint_8` behave the same as `unsigned char`? I assumed that `unsigned char` was exactly `unsigned char`, and the fact that `unsigned char` behaves the same as `uint_8` is just a (universal) quirk of the implementation. – Patrick Collins Jul 03 '14 at 03:40
  • 1
    Not sure. (Ultimately, I don't find this stuff very exciting. Type punning works in practise because most architectures don't pick daft representations for integral types and trap representations, when they exist at all, are exceedingly rare---NaT on Itanic is the only example I know of. I'd say it's safe to assume `uint_8` is the same thing as `unsigned char`, but I don't know whether the standard specifies or implies it.) – tmyklebu Jul 03 '14 at 03:48
  • Thanks @tmyklebu and PatrickCollins. I appreciate the insight. Type-punning via unions is done frequently across a number of platforms at my workplace and (given the same endianness) always appears to be implemented the same way. It is good to know this is a methodology I need to be more keenly aware of - particularly when porting to a new architecture. – embedded_guy Jul 03 '14 at 15:06