Below is a downstripped example of a tagged union template "Storage", which can assume two types L and R enclosed in a union, plus a bool indicating which of them is stored. The instantiation uses two different sized types, the smaller one actually being empty.
#include <utility>
struct Empty
{
};
struct Big
{
long a;
long b;
long c;
};
template<typename L, typename R>
class Storage final
{
public:
constexpr explicit Storage(const R& right) : payload{right}, isLeft{false}
{
}
private:
union Payload
{
constexpr Payload(const R& right) : right{right}
{
}
L left;
R right;
};
Payload payload;
bool isLeft;
};
// Toggle constexpr here
constexpr static Storage<Big, Empty> createStorage()
{
return Storage<Big, Empty>{Empty{}};
}
Storage<Big, Empty> createStorage2()
{
return createStorage();
}
- The constructor initializes the R-member with Empty, and is only calling the union's constructor for that member
- The union is never default initialized as a whole
- All constructors are constexpr
The function "createStorage2" should therefor only populate the bool tag, and leave the union alone. So I would expect a compile result with default optimization "-O":
createStorage2():
mov rax, rdi
mov BYTE PTR [rdi+24], 0
ret
Both GCC and ICC instead generate something like
createStorage2():
mov rax, rdi
mov QWORD PTR [rdi], 0
mov QWORD PTR [rdi+8], 0
mov QWORD PTR [rdi+16], 0
mov QWORD PTR [rdi+24], 0
ret
zeroing the entire 32 byte structure, while clang generates the expected code. You can reproduce this with https://godbolt.org/z/VsDQUu. GCC will revert to the desired initialization of the bool tag only, when you remove constexpr from the "createStorage" static function, while ICC remains unimpressed and still fills all 32 bytes.
Doing so is probably not a standard violation, as unused bits being "undefined" allows anything, including being set to zero and consuming unnecessary CPU cycles. But it's annoying, if you introduced the union for efficiency reason in first place, and your union members vary largely in size.
What is going on here? Is the any way to work around this behavior, provided that removing constexpr from constructors and the static function is not an option?
A side note: ICC seems to perform some extra operations even when all constexpr are removed, as in https://godbolt.org/z/FnjoPC:
createStorage2():
mov rax, rdi #44.16
mov BYTE PTR [-16+rsp], 0 #39.9
movups xmm0, XMMWORD PTR [-40+rsp] #44.16
movups xmm1, XMMWORD PTR [-24+rsp] #44.16
movups XMMWORD PTR [rdi], xmm0 #44.16
movups XMMWORD PTR [16+rdi], xmm1 #44.16
ret #44.16
What is the purpose of these movups instructions?