6

I have some questions about unrestricted unions and their application in practice. Let's suppose I have the following code :

struct MyStruct
{
    MyStruct(const std::vector<int>& a) : array(a), type(ARRAY)
    {}
    MyStruct(bool b) : boolean(b), type(BOOL)
    {}
    MyStruct(const MyStruct& ms) : type(ms.type)
    {
        if (type == ARRAY)
            new (&array) std::vector<int>(ms.array);
        else
            boolean = ms.boolean;
    }
    MyStruct& operator=(const MyStruct& ms)
    {
        if (&ms != this) {
            if (type == ARRAY)
                array.~vector<int>(); // EDIT(2) 
            if (ms.type == ARRAY)
                new (&array) std::vector<int>(ms.array);
            else
                boolean = ms.boolean;
            type = ms.type;
        }
        return *this;
    }
    ~MyStruct()
    {
        if (type == ARRAY)
            array.~vector<int>();
    }

    union {
        std::vector<int> array;
        bool             boolean;
    };
    enum {ARRAY, BOOL} type;
};
  1. Is this code valid :) ?
  2. Is it necessary to explicitly call the vector destructor each time we are using the boolean (as stated here http://cpp11standard.blogspot.com/2012/11/c11-standard-explained-1-unrestricted.html)
  3. Why a placement new is required instead of just doing something like 'array = ms.array' ?

EDIT:

  • Yes, it compiles
  • "Members declared inside anonymous unions are actually members of the containing class, and can be initialized in the containing class's constructor." (C++11 anonymous union with non-trivial members)
  • Adding explicit destructors as suggested, leads to SIGSEV with g++ 4.8 / clang 4.2
Community
  • 1
  • 1
3XX0
  • 1,315
  • 1
  • 13
  • 25
  • Unions are sometimes used to subvert the type-system - type punning. In that case, you wouldn't destruct the `vector` before reading/writing the `bool`, but the standard doesn't (and can't) define what that means - the effect depends on details of the platform. You probably wouldn't do it with `vector` anyway - maybe with a char array to see the bytes. If you're not punning, you can only read out what was last written in (or else expect data corruption) and you must remove (destruct) anything already in the union before writing in something else (or else expect memory/resource leaks). –  May 21 '13 at 06:47

2 Answers2

3
  1. The code's buggy: change array.clear(); to array.~vector<int>();

Explanation: operator= is using placement new over an object that hasn't been destructed, which could do anything but practically you can expect it to leak the dynamic memory the previous array had been using (clear() doesn't release memory / change capacity, it just destructs elements and changes size).

From 9.5/2:

If any non-static data member of a union has a non-trivial default constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move assignment operator (12.8), or destructor (12.4), the corresponding member function of the union must be user-provided or it will be implicitly deleted (8.4.3) for the union.

So, the vector constructor, destructor etc never kicks in by themselves: you must call them explicitly when wanted.

In 9.5/3 there's an example:

Consider the following union:

union U {
    int i;
    float f;
    std::string s;
};

Since std::string (21.3) declares non-trivial versions of all of the special member functions, U will have an implicitly deleted default constructor, copy/move constructor, copy/move assignment operator, and destructor. To use U, some or all of these member functions must be user-provided.

That last bit - "To use U, some or all of these member functions must be user-provided." - seems to presume that U needs to coordinate its own vaguely value-semantic behaviour, but in your case the surrouding struct is doing that so you don't need to define any of these union member functions.

2: we must call the array destructor whenever an array value is being replaced by a boolean value. If in operator= a new array value is being placement-newed instead of assigned, then the old array must also have its destructor called, but using operator= would be more efficient when the existing memory is sufficient for all the elements being copied. Basically, you must match constructions and destructions. UPDATE: the example code has a bug as per your comment below.

3: Why a placement new is required instead of just doing something like 'array = ms.array' ?

array = ms.array invokes std::vector<int>::operator= which always assumes the this pointer addresses an already properly constructed object. Inside that object you can expect there to be a pointer which will either be NULL or refer to some internal short-string buffer, or refer to heap. If your object hasn't been destructed, then operator= may well call a memory deallocation function on the bogus pointer. Placement new says "ignore the current content of the memory that this object will occupy, and construct a new object with valid members from scratch.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • Destructing explicitly the vector in the boolean constructor/copy constructor leads to SIGSEV :( I'm missing something here. Concerning anonymous union member, it seems allowed by the standard (see http://stackoverflow.com/questions/10693913/c11-anonymous-union-with-non-trivial-members) – 3XX0 May 21 '13 at 06:20
  • @3XX0: sorry - the Wikipedia's C++11 article (http://en.wikipedia.org/wiki/C%2B%2B11#Unrestricted_unions) is a bit misleading at "// Due to the Point member, a constructor definition is now required." - I shouldn't have trusted it without checking the Standard. Have updated my answer accordingly, and with the Standard as quoted above the copy constructor shouldn't call the vector destructor - sorry for the bum steer. – Tony Delroy May 21 '13 at 07:11
  • @3XX0: the Wikipedia article mentioned in my link above is consistent with 9.5/3 as discussed in my article - I'd say both are misleading in your case. – Tony Delroy May 21 '13 at 07:19
  • Thanks for the update. Indeed, I neglected the destructor in the assignment operator (my bad). However the following example (from the article) deletes the vector while constructing a string. Doing so with my example doesn't seem to be the right thing to do (SIGSEV) `str_int(std::string str) { _raw.~vector(); new (&str) std::string(str); }` – 3XX0 May 21 '13 at 07:46
  • @3XX0: th example from the article above is a constructor... as per 9.5/2 quoted above the `std::string` member won't be constructed as the union's constructor is implicitly deleted, so the constructor shouldn't invoke the destructor; the blog example code is just wrong. Just use the placement `new`. – Tony Delroy May 21 '13 at 08:35
2

The union does not declare a default constructor, copy constructor, copy assignment operator, or destructor.

If std::string declares at least one non-trivial version of a special member function (which is the case), the forementioned ones are all implicitly deleted, and you must declare (and define) them (... if they're used, which is the case).

Insofar, that code isn't correct and should not successfully compile (this is almost-to-the-letter identical to the example in 9.5 par 3 of the standard, except there it's std::string, not std::vector).
(Does not apply for an anon union, as correctly pointed out)

About question (2): In order to safely switch the union, this is necessary, yes. The Standard explicitly says that in 9.5 par 4 [Note].
It makes sense too, if you think about it. At most one data member can be active in a union at any time, and they're not magically default constructed/destroyed, which means you need to properly construct/destruct things. It is not meaningful (or even defined) to use the union as something else otherwise (not that you couldn't do that anyway, but it's undefined).
The object is not a pointer, and you don't know whether it's allocated on the heap either (even if it is allocated on the heap, then it's inside another object, so it's still not allowable to delete it). How do you destroy an object if you can't call delete? How do you allocate an object -- possibly several times -- without leaking if you can't delete it? This doesn't leave many choices. Insofar, the [Note] makes perfect sense.

Damon
  • 67,688
  • 20
  • 135
  • 185
  • Since it's a anonymous union it's allowed to use class facilities instead (http://stackoverflow.com/questions/10693913/c11-anonymous-union-with-non-trivial-members) Concerning your answer (2) I couldn't agree more but destructing the vector leads to a SIGSEV ... Even the last code snippet from the node segfault (clang 3.2 / g++ 4.8) – 3XX0 May 21 '13 at 06:46