6

As the (Working Draft of) C++ Standard says:

9.5.1 [class.union]

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.

But I don't know how to identify which is the active member of an union and I'm not used enough to dive into the standard to locate what the standard says about it, I've tried to figure how the active member is setted but I've found how it is swapped:

9.5.4 [class.union]

[ Note: In general, one must use explicit destructor calls and placement new operators to change the active member of a union. —end note ] [Example: Consider an object u of a union type U having non-static data members m of type M and n of type N. If M has a non-trivial destructor and N has a non-trivial constructor (for instance, if they declare or inherit virtual functions), the active member of u can be safely switched from m to n using the destructor and placement new operator as follows:

u.m.~M();
new (&u.n) N;

end example ]

So my guess is that the active member of an union is the one first asigned, used, constructed or placement-new'ed; but this becomes kind of tricky with uniform initialization, consider the following code:

union Foo
{
    struct {char a,b,c,d;};
    char array[4];
    int integer;
};

Foo f; // default ctor
std::cout << f.a << f.b << f.c << f.d << '\n';

Which is the active member of the union on the code above? Is std::cout reading from the active member of the union? What about the code below?

Foo f{0,1,2,3}; // uniform initialization
std::cout << f.a << f.b << f.c << f.d << '\n';

With the lines above we can initialize the nested anonymous struct or either the array, if I provide only an integer I can initialize Foo::a or Foo::array or Foo::integer... which one would be the active member?

Foo f{0}; // uniform initialization
std::cout << f.integer << '\n';

I guess that the active member would be the aninymous struct in all of the above cases but I'm not sure.

If I want to activate one or the other union member, should I provide a constructor activating it?

union Bar
{
    // #1 Activate anonymous struct
    Bar(char x, char y, char z, char t) : a(x),b(y),c(z),d(t) {}
    // #2 Activate array
    Bar(char (&a)[4]) { std::copy(std::begin(a), std::end(a), std::begin(array)); }
    // #3 Activate integer
    Bar(int i) : integer(i) {}

    struct {char a,b,c,d;};
    char array[4];
    int integer;
};

I'm almost sure that #1 and #3 will mark as active union the anonymous struct and the integer but I don't know about the #2 because in the moment we reach the body of the constructor the members are already constructed! so are we calling std::copy over an inactive union member?

Questions:

  • Which are the active union members of Foo if it is constructed with the following uniform initialization:
    • Foo{};
    • Foo{1,2,3,4};
    • Foo{1};
  • In the #2 constructor of Bar the Bar::array is the active union member?
  • Where in the standard can I read about which is exactly the active union member and how to set it without placement new?
PaperBirdMaster
  • 12,806
  • 9
  • 48
  • 94

2 Answers2

3

Your concern about the lack of a rigorous definition of the active member of a union is shared by (at least some of) the members of the standardization committee - see the latest note (dated May 2015) in the description of active issue 1116:

We never say what the active member of a union is, how it can be changed, and so on. [...]

I think we can expect some sort of clarification in future versions of the working draft. That note also indicates that the best we have so far is the note in the paragraph you quoted in your question, [9.5p4].

That being said, let's look at your other questions.

First of all, there are no anonymous structs in standard C++ (only anonymous unions); struct {char a,b,c,d;}; will give you warnings if compiled with reasonably strict options (-std=c++1z -Wall -Wextra -pedantic for Clang and GCC, for example). Going forward, I'll assume we have a declaration like struct { char a, b, c, d; } s; and everything else is adjusted accordingly.

The implicitly defaulted default constructor in your first example doesn't perform any initialization according to [12.6.2p9.2]:

In a non-delegating constructor, if a given potentially constructed subobject is not designated by a mem-initializer-id (including the case where there is no mem-initializer-list because the constructor has no ctor-initializer), then

(9.1) - if the entity is a non-static data member that has a brace-or-equal-initializer and either

(9.1.1) - the constructor’s class is a union (9.5), and no other variant member of that union is designated by a mem-initializer-id or
(9.1.2) - the constructor’s class is not a union, and, if the entity is a member of an anonymous union, no other member of that union is designated by a mem-initializer-id,

the entity is initialized as specified in 8.5;

(9.2) - otherwise, if the entity is an anonymous union or a variant member (9.5), no initialization is performed;

(9.3) - otherwise, the entity is default-initialized (8.5).

I suppose we could say that f has no active member after its default constructor has finished executing, but I don't know of any standard wording that clearly indicates that. What can be said in practice is that it makes no sense to attempt to read the value of any of f's members, since they're indeterminate.

In your next example, you're using aggregate initialization, which is reasonably well-defined for unions according to [8.5.1p16]:

When a union is initialized with a brace-enclosed initializer, the braces shall only contain an initializer-clause for the first non-static data member of the union. [ Example:

union u { int a; const char* b; }; 
u a = { 1 }; 
u b = a; 
u c = 1;               // error 
u d = { 0, "asdf" };   // error 
u e = { "asdf" };      // error 

end example ]

That, together with brace elision for the initialization of the nested struct, as specified in [8.5.1p12], makes the struct the active member. It answers your next question as well: you can only initialize the first union member using that syntax.

Your next question:

If I want to activate one or the other union member, should I provide a constructor activating it?

Yes, or a brace-or-equal-initializer for exactly one member according to [12.6.2p9.1.1] quoted above; something like this:

union Foo
{
    struct { char a, b, c, d; } s;
    char array[4];
    int integer = 7;
};

Foo f;

After the above, the active member will be integer. All of the above should also answer your question about #2 (the members are not already constructed when we reach the body of the constructor - #2 is fine as well).

Wrapping up, both Foo{} and Foo{1} perform aggregate initialization; they're interpreted as Foo{{}} and Foo{{1}}, respectively, (because of brace elision), and initialize the struct; the first one sets all the struct members to 0 and the second one sets the first member to 1 and the rest to 0, according to [8.5.1p7].


All standard quotes are from the current working draft, N4527.


Paper N4430, which deals with somewhat related issues, but hasn't been integrated into the working draft yet, provides a definition for active member:

In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended ([basic.life]).

This effectively passes the buck to the definition of lifetime in [3.8], which also has a few issues open against it, including the aforementioned issue 1116, so I think we'll have to wait for several such issues to be resolved in order to have a complete and consistent definition. The definition of lifetime as it currently stands doesn't seem to be quite ready.

Language Lawyer
  • 3,378
  • 1
  • 12
  • 29
bogdan
  • 9,229
  • 2
  • 33
  • 48
  • Very extended answer with clear examples :) the document mentioned as source of the quotes is here: [N4527](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4527.pdf). – PaperBirdMaster Jul 14 '15 at 06:50
  • 1
    @PaperBirdMaster I've just stumbled upon a paper that makes an attempt at providing a definition, and I've added a reference to it to the answer. I've also added the link to the working draft - good idea to have it in there. – bogdan Jul 14 '15 at 19:15
0

The active member is the last member you wrote to. Simple as that.

The term is not defined by C++ because it is defined by English.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • Shouldn't you answer the questions at the end of the original post? – Jashaszun Jul 13 '15 at 16:09
  • @Jashaszun: This information covers them all. If the OP wants them treated as distinct questions then he should have posted them as such. – Lightness Races in Orbit Jul 13 '15 at 16:10
  • Fair enough. I guess it is really that simple, after all. – Jashaszun Jul 13 '15 at 16:10
  • Sorry for my lack of searching skills... :( I'm unable to locate where is stated that the active member is the last member written. – PaperBirdMaster Jul 14 '15 at 07:02
  • @PaperBirdMaster: _"The term is not defined by C++ because it is defined by English."_ – Lightness Races in Orbit Jul 14 '15 at 09:58
  • So, @LightnessRacesinOrbit what you're saying is that "*active*" in English means "*the last used one*" and therefore my question is implicitly answered by the semantic meaning of "*active*"? - as you can tell I'm not a native English speaker, hence the question. – PaperBirdMaster Jul 14 '15 at 12:10
  • @PaperBirdMaster: The term "active" implies something that is being or can be used. The phrase "at most can be active at any one time" tell us that we can then only "use" one member at a time. "Use" includes reading and writing. So, for any period of time, we shall only "use" one member. Granted, there is some reading between the lines here, but to a native speaker it's fairly unambiguous. The point is that the standard does not spell it out and doesn't really need to. :) – Lightness Races in Orbit Jul 14 '15 at 12:12
  • If a union `u` contains two structures `s1` and `s2` both of which have an "int" `i` as their first member, and if `s1` happens to be the active member, would `fscanf("%d", &(u.s2.i));` set the active member to s2? – supercat Oct 14 '16 at 19:04
  • @supercat: Good question. You may ask it [here](http://stackoverflow.com/questions/ask). – Lightness Races in Orbit Oct 14 '16 at 19:15
  • @LightnessRacesinOrbit: I was responding rhetorically to "simple as that"; in the easy cases, it's simple, but in the complex cases the only sensible treatment would be to allow for the possibility that a struct may sometimes have multiple active members which may be used interchangeably, but the wording of the Standard doesn't allow for that. – supercat Oct 14 '16 at 19:46
  • @LightnessRacesinOrbit: The phrase "simple as that" in your answer made it sound, at least to this native American English speaker, like it was intended to state a universally-applicable rule, and I would think many others would interpret it likewise. If your intention was to state a simple general-but-not-universal rule and note that the OP's situation does not involve any complicating exceptions, I don't think your language accurately reflected that. – supercat Oct 15 '16 at 17:00
  • @supercat: I'm writing [British] English. – Lightness Races in Orbit Oct 15 '16 at 17:45