11

Can I put a T and a wrapped T in an union and inspect them as I like?

union Example {
    T value;
    struct Wrapped { 
       T wrapped;
    } wrapper;
};
// for simplicity T = int

Example ex;
ex.value = 12;
cout << ex.wrapper.wrapped; // ?

The C++11 standards only guarantee save inspection of the common initial sequence, but value isn't a struct. I guess the answer is no, since wrapped types aren't even guaranteed to be memory compatible to their unwrapped counterpart and accessing inactive members is only well-defined on common initial sequences.

Zeta
  • 103,620
  • 13
  • 194
  • 236

3 Answers3

4

I believe this is undefined behavior.

[class.mem] gives us:

The common initial sequence of two standard-layout struct types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types and either neither entity is a bit-field or both are bit-fields with the same width. [...]

In a standard-layout union with an active member of struct type T1, it is permitted to read a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2; the behavior is as if the corresponding member of T1 were nominated.

If T isn't a standard layout struct type, this is clearly undefined behavior. (Note that int is not a standard layout struct type, as it's not a class type at all).

But even for standard layout struct types, what constitutes a "common initial sequence" is based strictly on non-static data members. That is, T and struct { T val; } do not have a common initial sequence - there are no data members in common at all!

Hence, here:

template <typename T>
union Example {
    T value;
    struct Wrapped { 
       T wrapped;
    } wrapper;
};


Example<int> ex;
ex.value = 12;
cout << ex.wrapper.wrapped; // (*)

you're accessing an inactive member of the union. That's undefined.

Community
  • 1
  • 1
Barry
  • 286,269
  • 29
  • 621
  • 977
-1

Union behavior is undefined when accessing a member that wasn't the last one written to. So no, you can't depend on this behavior.

It's identical in principle to the idea of having a union to extract specific bytes from an integer; but with additional risk of the fact that you're now depending on the compiler not adding any padding in your struct. See Accessing inactive union member and undefined behavior? for more details.

UKMonkey
  • 6,941
  • 3
  • 21
  • 30
  • Your first paragraph doesn't hold for common initial sequences. Also, I've linked the Q&A already in my question. – Zeta Jan 02 '18 at 10:27
  • I think the answer from @Steeve and forcing the alignment with union alignas(sizeof(T)) Example { T value; struct Wrapped { T wrapped; } wrapper }; should work – StPiere Jan 02 '18 at 10:30
  • @Zeta I'm pretty sure that it does hold. Yes, you may have linked the question, but re-read it (note that the answer there has both C11 & C++11) You'll see that it states that `value of at most one of the non-static data members can be stored in a union at any time` - and the concept of a trap representation has been removed. – UKMonkey Jan 02 '18 at 14:25
  • @UKMonkey No idea :/. Every answer on this question has at least one downvote (including the deleted one). – Zeta Jan 03 '18 at 11:50
-1

It should work because both Example and Wrapped are standard layout classes, and C++14 standard has enough requirements to guarantee that in that case value and wrapper.wrapped are located at the same address. Draft n4296 says in 9.2 Class members [class.mem] §20:

If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member.

A note even says:

[ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]

That means that you at least respect the strict aliasing rule from 3.10 Lvalues and rvalues [basic.lval] §10

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined
— the dynamic type of the object,
...
— an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

So this is perfectly defined:

cout << *(&ex.wrapper.wrapped) << endl

because &ex.wrapper.wrapped is required to be the same as &ex.value and the pointed object has the correct type. . But as the standard is explicit only for common subsequence. So my understanding is cout << ex.wrapper.wrapped << endl invokes undefined behaviour, because of a note in 1.3.24 [defns.undefined] about undefined behavior says (emphasize mine):

Undefined behavior may be expected when this International Standard omits any explicit definition of behavior...

TL/DR: I would bet a coin that most if not all common implementation will accept it, but because of the note from 1.3.24 [defns.undefined], I would never use that in production code but would use *(&ex.wrapper.wrapped) instead.


In the more recent draft n4659 for C++17, the relevant notion is inter-convertibility ([basic.compound] §4).

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Uh, I wasn't aware of [defns.undefined]. Thanks for bringing that up. The wording in C+11's 9.2§20 is very similar, but explicitly mentions pointer: *"A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa"*. So you can cite n3337 if you want to include C++11 in your answer. – Zeta Jan 02 '18 at 11:45
  • Dereferencing a pointer that has the right address and type, does not make the object it refers accessible. – Oliv Jan 02 '18 at 12:30
  • @SergeBallesta Literaly: `reinterpret_cast(&union_A.member_a)->member_b` supposing that member_b is the active member, member_a does not need to be active. – Oliv Jan 02 '18 at 12:43
  • @Oliv: That's the reason why I use a pointer. As `value` has just been assigned, it is the active member of the union. Because `value` and `wrapper.wrapped` have same address and same type, `&ex.wrapper.wrapped` is in fact a pointer to `ex.value` which is the valid member. That the reason why it can be dereferenced. – Serge Ballesta Jan 02 '18 at 12:55
  • @SergeBallesta, In the standard, it is never said that a pointer with the right value and type is a *pointer to object* [basic.compound], and there are given many exemple where a pointer with the right address and type are invalid pointer. For exemple `int arr[10]{}; *reinterpret_cast(&arr)=1` is UB evenif the address of the array and the address of its first element are the same. – Oliv Jan 02 '18 at 13:26
  • @Oliv [basic.compound] §3 contains explicitely: *If an object of type T is located at an address A, a pointer of type cv T\* whose value is the address A is said to point to that object, **regardless of how the value was obtained**.* – Serge Ballesta Jan 02 '18 at 13:29
  • @SergeBallesta Which version of the standard? i do not find it in the last one. – Oliv Jan 02 '18 at 13:31
  • @SergeBallesta Found it, they have just removed this exact sentence in the C++17 standard! So until C++17 your answer is right for sure. If you could make a small edit (and maybe mention it) I could upvote again. – Oliv Jan 02 '18 at 13:36
  • @Oliv n4296 for C++14. Same sentence in n3337 for C++11. In n4659 for C++17 the notion is the *inter-convertibility* of pointers in §4 of basic compound. A pointer to an array and a pointer to its first element are not inter-convertible because they have different types. – Serge Ballesta Jan 02 '18 at 13:39
  • I admit the exemple is not perfect. Here a closer one: `unsigned char buffer[2*sizeof(int)]; auto p1=new(buffer) int{}; auto p2 = new(p1+1) int{}; *(p1+1)=10 //UB`, here `*(p1+1)` does not point to `*p2` even if `p1+1` has the right type and right address. – Oliv Jan 02 '18 at 13:43
  • @SergeBallesta I have asked a question about that, I hope to get a clear answer! https://stackoverflow.com/questions/48062346/is-a-pointer-with-the-right-address-and-type-still-always-a-valid-pointer-since – Oliv Jan 02 '18 at 14:02
  • You can't get around the lifetime rules that easily. There's a decent argument that simply writing `ex.wrapper.wrapped` is UB by omission because [expr.ref]/4.2 defines the behavior of the class member access expression only when the first expression (here, `ex.wrapper`) designates an object, but there's no living `Wrapped` object. Moreover, if `Wrapped` has a nontrivial constructor, then [class.cdtor]/1 makes that same expression explicitly UB. – T.C. Jan 03 '18 at 02:24