14

I want to reinterpret data of one type as another type in a portable way (C99). I am not talking about casting, I want a reinterpretation of some given data. Also, by portable I mean that it does not break C99 rules - I do not mean that the reinterpretated value is equal on all systems.

I know 3 different way to reinterpret data, but only two of these are portable:

  1. This is not portable - it breaks the strict aliasing rule.

    /* #1 Type Punning */
    
    float float_value = 3.14;
    int *int_pointer = (int *)&float_value;
    int int_value = *int_pointer;
    
  2. This is platform dependent, because it reads an int value from the union after writing a float into it. But it does not break any C99 rules, so that should work (if sizeof(int) == sizeof(float)).

    /* #2 Union Punning */
    
    union data {
      float float_value;
      int int_value;
    };
    
    union data data_value;
    data_value.float_value = 3.14;
    int int_value = data_value.int_value;
    
  3. Should be fine, as long as sizeof(int) == sizeof(float)

    /* #3 Copying */
    
    float float_value = 3.14;
    int int_value = 0;
    memcpy(&int_value, &float_value, sizeof(int_value));
    

My Questions:

  1. Is this correct?
  2. Do you know other ways to reinterpret data in a portable way?
Flexo
  • 87,323
  • 22
  • 191
  • 272
Johannes
  • 871
  • 1
  • 10
  • 16
  • The $float_value should be &float_value ? – wildplasser Dec 14 '11 at 21:12
  • Reinterpretation of data gives platform dependent results. How could this work portably? For example, different platforms might represent `float` differently in memory. – Magnus Hoff Dec 14 '11 at 21:17
  • @MagnusHoff thats true - but all i need is correct ansi c99 and a defined value – Johannes Dec 14 '11 at 21:19
  • 1
    @DanFego i read bytecode for a VM and there i need to give the bytearray a meaning – Johannes Dec 14 '11 at 21:21
  • 1
    @Johannes: "and a defined value"? But the value is *not* defined, precisely because of what I said. Or did I perhaps not get you quite right? – Magnus Hoff Dec 14 '11 at 21:21
  • @MagnusHoff the memcpy version should give always the same result on the same machine, right? That is not true for #1 and #2 - as far as i know! – Johannes Dec 14 '11 at 21:22
  • There is also a semicolon missing after the union. – wildplasser Dec 14 '11 at 21:28
  • 2
    Per my understanding, #2 was supposed to be implementation-defined (not undefined) in C99. It is since TC3 (anyone can confirm this?). It's implementation-defined in C1x. – ninjalj Dec 14 '11 at 21:35
  • @ninjalj if this is true, that would fit the bill for my problem – Johannes Dec 14 '11 at 21:37
  • 1
    @Cristoph mentions a footnote to the effect in TC3 in a comment to http://stackoverflow.com/questions/6486807/c-overcoming-aliasing-restrictions-unions , which I think corresponds to Defect Report http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm – ninjalj Dec 14 '11 at 21:44
  • @ninjalj: added the footnote as an answer... – Christoph Dec 15 '11 at 00:45
  • This article discusses this exact problem: http://blog.regehr.org/archives/959 – Tor Klingberg Jan 16 '15 at 11:37

5 Answers5

18

Solution 2 is portable - type punning through unions has always been legal in C99, and it was made explicit with TC3, which added the following footnote to section 6.5.2.3:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

Annex J still lists it as unspecfied behaviour, which is a known defect and has been corrected with C11, which changed

The value of a union member other than the last one stored into [is unspecified]

to

The values of bytes that correspond to union members other than the one last stored into [are unspecified]

It's not that big a deal as the annex is only informative, not normative.

Keep in mind that you can still end up with undefined behaviour, eg

  • by creating a trap representation
  • by violating aliasing rules in case of members with pointer type (which should not be converted via type-punning anyway as there need not be a uniform pointer representation)
  • if the union members have different sizes - only the bytes of the member last used in a store have specified value; in particular, storing values in a smaller member can also invalidate trailing bytes of a larger member
  • if a member contains padding bytes, which always take unspecified values
Christoph
  • 164,997
  • 36
  • 182
  • 240
  • Does the fixed paragraph from Annex J ("The values of bytes that correspond to union members other than the one last stored into [are unspecified]") mean that bytes that are outside the ones in the object representation of the member that is written have unspecified value? Because this is exactly what 6.2.6.1 paragraph 7 says (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf). Thank you. – user42768 Sep 14 '18 at 08:29
2
  1. The union solution is as defined as the memcpy one in C (AFAIK, it is UB in C++), see DR283

  2. It is possible to cast a pointer to a pointer to (signed/unsigned/) char, so

    unsigned char *ptr = (unsigned char*)&floatVar;
    

    and then accessing ptr[0] to ptr[sizeof(floatVar)-1] is legal.

AProgrammer
  • 51,233
  • 8
  • 91
  • 143
  • that's true, but that would be a reinterpretation as a bytearray - and i need an int – Johannes Dec 14 '11 at 21:49
  • 1
    It is UB in C++ except in the special case of POD-structs `If a POD-union contains several POD-structs that share a common initial sequence (9.2), and if an object of this POD-union type contains one of the POD-structs, it is permitted to inspect the common initial sequence of any of POD-struct members` – Dave Rager Dec 14 '11 at 21:57
  • @Dave, yes, I didn't mention the X hack because it doesn't provide a way to do type puning. – AProgrammer Dec 14 '11 at 21:58
0

to be safe, I'd go with with a byte array (unsigned char) rather than an 'int' to hold the value.

Keith Nicholas
  • 43,549
  • 15
  • 93
  • 156
  • Then another question arises: How does one reinterpret `int` (or an arbitary other type) as a `unsigned char []`? –  Dec 14 '11 at 21:15
  • @Keith a bytearray would interpret the data as a sequence of bytes. what i need is a reinterpretation – Johannes Dec 14 '11 at 21:16
  • 1
    to make sure the float is fully represented ie, if sizeof(float) != sizeof(int) – Keith Nicholas Dec 14 '11 at 22:10
0

the data type int is an example of a non-portable type since endianness can change byte order between platforms.

if you want to be portable you need to define your own types, then implement them on each platform that you want to port to. Then define conversion methods for your data types. That is as far as I know the only way to have full control of byte orders etc.

AndersK
  • 35,813
  • 6
  • 60
  • 86
  • thanks for your answer - i think i was not clear that all i need is correct ansi c99 and a defined value – Johannes Dec 14 '11 at 21:39
0

If you want to avoid the strict aliasing rule, you need to first cast to a char pointer:

float float_value = 3.14;
int *int_pointer = (int *)(char *)&float_value;
int int_value = *int_pointer;

Note however, that you might have sizeof(int) > sizeof(float), in which case you still get undefined behavior

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226