0

Let's consider a union of integers of different sizes. Is it guaranteed that if a number fits the range of each of the integer types, it can be written to and read out from any of the union data members correctly?

E.g. this code

  union U {
    int32_t i;
    int16_t s;
  } u;
  u.s = 1000;
  std::cout<<u.i<<std::endl;

I verified that it prints correctly "1000" on one computer. Is it guaranteed to work the same on any other system? I guess on any system the endianness would be the same for any integer type, so it's rather a question whether the union is guaranteed to use the less significant bytes of the larger integer for the smaller one?

I know this has no chance to work for negative numbers, so let's consider non-negative numbers only.

timrau
  • 22,578
  • 4
  • 51
  • 64
user1079505
  • 173
  • 1
  • 8
  • Try preceding this code with `u.i = 0xdeadbeef` and see if you still get the right answer. – Nate Eldredge Apr 29 '21 at 23:44
  • Why? Is there something wrong with casting all of a sudden? – user207421 Apr 29 '21 at 23:59
  • Using unions for type punning is frowned on in C++ because it has many, many failure cases. – user4581301 Apr 30 '21 at 00:00
  • 1
    Writing to one union member and reading from another union member is undefined behavior. (I think it is okay in C. But we're talking C++.) The C++ way is to memcpy from the one to the other, or do it with static_cast, or do it with math — depending on what you need. – Eljay Apr 30 '21 at 00:04
  • @NateEldredge yes, I understand that writing to `u.s` modifies only two of the bytes, so I need to ensure independently that the other two are 0 independently. This isn't what I was asking about (but seems I got an answer below). – user1079505 Apr 30 '21 at 00:41

2 Answers2

2

No, that only works in little endian. In big endian the bytes are stored from the most significant position down. 0xdeadbeef will be stored in memory as ef be ad de in little endian and de ad be ef big endian in memory so reading 2 bytes from the start address will result in 0xbeef and 0xdead in those machines respectively

However you're getting undefined behavior, because you're writing the smaller 2-byte field first then read the larger one. That means the high 2 bytes of int32_t will contain garbage. That's disregarding the fact that using union in C++ is already UB. You can only do that in C

If you write the larger field first the read the smaller one then it works even for signed types:

u.i = -1000;
std::cout << u.s << '\n';

This will print out as expected on a little endian machine

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • Using union is not undefined behavior in C++. But it is UB to read u.i after writing to u.s; although a non-standard language extension may allow it. – thelizardking34 Apr 30 '21 at 13:23
  • Assuming little endian, if I write `u.i = 0; u.s = 1000;` would the the older 2 bits contain garbage, or can I expect them to be zeros? – user1079505 May 01 '21 at 21:17
  • you write 4 bytes, and only read the first 2. How can the latter 2 bytes be garbage? – phuclv May 01 '21 at 23:07
1

The union will always be only as big as necessary to hold its largest data member.

The other data members are allocated in the same bytes as the largest member.

The details of how the 'other members' are allocated is implementation defined. By this definition, the answer is no it's not guaranteed. See: https://en.cppreference.com/w/cpp/language/union "Explanation"

Since C++14, all non-static data members of the union will have the same address; but that doesn't say anything about endianness or implementation support.

thelizardking34
  • 338
  • 1
  • 12