Does this type aliasing using union invoke undefined behavior?

Question

For example,

#include <cstdint>
#include <cstdio>

struct ipv4addr {
  union {
    std::uint32_t value;
    std::uint8_t parts[4];
  };
};

int main() {
  ipv4addr addr;
  addr.value = static_cast<std::uint32_t>(-1);
  std::printf("%hhu.%hhu.%hhu.%hhu",
    addr.parts[0], addr.parts[1], addr.parts[2], addr.parts[3]);
}

Per cppref,

The details of that allocation are implementation-defined, and it's undefined behavior to read from the member of the union that wasn't most recently written.

So looks like the code invokes undefined behavior. But the page also says

If two union members are standard-layout types, it's well-defined to examine their common subsequence on any compiler.

I don't quite understand this. Does it make the code behavior well-defined?

Also note cppref's description on type aliasing.

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

[...]

AliasedType is std::byte, char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

I guess this applies to std::uint8_t as well. No?

Does [this question](https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior) answer yours? — 1201ProgramAlarm, Mar 12 '19 at 04:12
[This](https://stackoverflow.com/questions/54762186/unions-aliasing-and-type-punning-in-practice-what-works-and-what-does-not/54763849#54763849) may be informative. — Passer By, Mar 12 '19 at 04:17
Possible duplicate of [Accessing inactive union member and undefined behavior?](https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior) — Dmytro Dadyka, Mar 12 '19 at 04:25
I admit to having no freaking clue what happens to a `uint8` on a system where a byte is 16 bits. Time for some Standard-diving. — user4581301, Mar 12 '19 at 04:31
Ah ha! It's optional! In other words it byte isn't 8 bits, this probably won't compile. — user4581301, Mar 12 '19 at 04:47

score 3 · Accepted Answer · answered Mar 12 '19 at 04:58

Common initial subsequence has a ridiculously specific definition. int and struct foo{int x;} do not have a common initial subsequence.

struct foo{int x;} and struct bar{int y;} do have a common initial subsequence.

Reading memory through an unrelated type is not the same as reading from a union alternative. That text doesn't do anything there.

You can do (std::unit8_t const*)&addr.value and treat it as a 4 byte array, assuming your platform has unit8_t. The byte values you get are implementation defined.

You cannot, under the standard, read from parts[i] however (when value exists).

Compilers are free to specify behaviour when the standard states it is undefined under the standard, except during a compile time constexpr evaluation.

Members of a union are pointer-interconvertible, do you not get the same lvalue from the cast compared to accessing directly? — Passer By, Mar 12 '19 at 05:06

score 1 · Answer 2 · answered Mar 12 '19 at 04:15

Reading the contents of a union member that was not most-recently assigned is indeed undefined behavior. However, most all compilers I've used have non-standard extensions to allow it.

Technically if you want this to be safe with no chance for failure you should store the ip as int32_t and reinterpret via reinterpret_cast to read the individual bytes, like so:

int32_t ip = 185734;
int8_t *ip_bytes = reinterpret_cast<int8_t*>(&ip).

ip_bytes[etc]...

However, you should keep in mind the endianness of your platform will impact the byte ordering for any 32-bit read/write. Therefore it may be safer to scrap the int32_t idea entirely and just use an array of bytes. It all depends on what you need and/or whatever any library you might be using requires.

Guess I will stick to `std::uint32_t`, but get `parts` by 256 modulo (`/ 256` and `% 256` consecutively). This also removes dependency on endianness. — Lingxi, Mar 12 '19 at 08:57
@Lingxi Either that or bitwise, whichever is clearer. Compiler will optimize both the same way regardless. And I'd recommend making a wrapper type for it just for readability/reliability. — Cruz Jean, Mar 12 '19 at 21:27

score -2 · Answer 3 · answered Mar 12 '19 at 04:11

-2

That's well defined to do exactly what fwrite followed by four get calls to read it back does. The actual result is machine dependent. "Get me the bytes for this uint32" in whatever storage mode the CPU prefers.

answered Mar 12 '19 at 04:11

Joshua

40,822
8
72
132

Does this type aliasing using union invoke undefined behavior?

3 Answers3