0

Consider a simple union with a changed "active member":

union U {
  int i;
  char *p;
};

U u = { 1 };
u.p = 0;

Is there any revision of the C++ standard that can properly define what happens here?

In particular, what is u.p semantically? It's a lvalue at compile time, but what does its evaluation refer to at run time?

Can a pointer object exist in u before it's assigned to?

Can objects exist before their lifetime even begins?

Can two scalar objects (of distinct types) coexist at the same time at the same address?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
  • What is the difference between 'a simple union with a changed "active member"' and 'a simple union'? The latter is how I would describe the one in the shown code. – Yunnosch Oct 30 '19 at 21:14
  • A union object w/o an active member still has an address just like `int i;` you can take the address of i, and declare a reference to it, but using its value w/o initializing is UB. – doug Oct 30 '19 at 21:18
  • @doug Are you doubting that `int i;` creates an `int` object? – curiousguy Oct 30 '19 at 21:22
  • Union is just a way to save memory by storing different things in the same memory **at different times**. There is nothing special about them and if you try to access a member before initializing it, then it is just like accessing any other variable without initializing it. You have undefined behavior. – darcamo Oct 30 '19 at 21:33
  • `int i;` has uninitialized storage but it's lifetime hasn't begun. – doug Oct 30 '19 at 21:37
  • The compiler will treat it however your code does. If you try to pass the integer to ``printf`` you will get a compiler error. *You can really think of a union in that sense as multiple fields in an object that just happen to share the same memory.* It can be useful for example having a field defining some type and a separate union field holding it by value. – Pickle Rick Oct 30 '19 at 21:45
  • 1
    More interesting scenarios arise with array objects within unions, since an expression like `union.arrayMember[i]=something` has to take the address of `union.arrayMember[0]` as a non-l value and then dereference that to form the lvalue upon which the assignment is performed, which would imply that `arrayMember` must exist before the assignment. – supercat Oct 30 '19 at 22:23
  • @supercat Yes but want a clear explanation of the most simple possible case of setting an active member first. Even in the simplest case I think the std is defective. – curiousguy Oct 30 '19 at 23:08
  • @PickleRick "_You can really think of a union in that sense as multiple fields in an object that just happen to share the same memory_" Think you, I know what they are for, I want to know about the semantic of that fundamental stuff is defined, in particular, **what an object is.** – curiousguy Oct 30 '19 at 23:09
  • @Yunnosch Without the ability to change the active member, there isn't much practical use for even a simple union... – curiousguy Oct 30 '19 at 23:11
  • @curiousguy: In the abstraction used by Ritchie's Language, allocating a region of storage simultaneously creates every object of every type that will fit anywhere within it; writing to any of those objects will change the contents of the storage, and reading an object will interpret whatever is there as a value of the proper type. In C++, for standard-layout types with trivial constructors, there's no reason the same rule couldn't apply if all accesses to storage had to be done via glvalues that were related somehow, or separated by a "something is weird is happening" indication. – supercat Oct 30 '19 at 23:12
  • @curiousguy: The real problem is that the authors of the C Standard described the situations where compilers must allow for aliasing between *seemingly-unrelated* lvalues, and never imagined that compiler writers would use that as an excuse to ignore obvious relationships between lvalues that were, at time of use, each individually freshly derived from a common base. – supercat Oct 30 '19 at 23:15
  • @supercat Several ppl on SO told me that the idea of many obj (infinitely many actually, there are inf. many scalar types) "existing" at the same location is patently absurd and impossible; yet nobody was able to show a contradiction or impl difficulty arising from such hypothesis. – curiousguy Oct 30 '19 at 23:16
  • Sorry, maybe I'm misunderstanding your question. What I really mean is the behavior for a union is really the same as having that type outside of a union. Whenever you access the object as one particular type, the same rules will apply. It will reside in memory as long as it's in scope like you would expect. There's really no difference other than sharing a memory address so I'm not sure how else to answer. Although there are some subtle differences such as not being automatically destructed, you can handle that in a custom destructor for that though. – Pickle Rick Oct 30 '19 at 23:20
  • @PickleRick The Q is about the existence of objects. **Do more than one scalar obj exist at the same location?** – curiousguy Oct 30 '19 at 23:24
  • Ah, I'm sorry, the answer to that is no. The union will only ever have one entry at a time held in memory. – Pickle Rick Oct 30 '19 at 23:25
  • @PickleRick If the other obj does not exist, how can assign it a value? **What is the expression naming an non existing obj actually referring to?** – curiousguy Oct 30 '19 at 23:27
  • All of these things happen the same as they normally would. So doing ``obj.u.val = 32;`` is no different than ``obj.val = 32;`` assuming they're the same type. There's not really a defined relationship in a union, it's more like a ``void*`` pointer used to store different data of the same size. – Pickle Rick Oct 30 '19 at 23:30
  • If you have a union holding an integer and a float, setting the float to 1.0 would in effect set the integer to 0x3F800000 for example. – Pickle Rick Oct 30 '19 at 23:32
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/201626/discussion-between-curiousguy-and-pickle-rick). – curiousguy Oct 30 '19 at 23:36
  • It looks to me like the Standard ought to say only that `u.p = 0` begins the lifetime of `u.p`, not that it creates the subobject. – aschepler Oct 31 '19 at 01:37

1 Answers1

2

u.p refers to storage allocated for an object whose lifetime has not yet started, as permitted by [basic.life]/7: "Before the lifetime of an object has started but after the storage which the object will occupy has been allocated... any glvalue that refers to the original object may be used but only in limited ways."

Then there's the special magic by which an assignment to a union member starts the lifetime of the object:

[class.union]/5 When the left operand of an assignment operator involves a member access expression ([expr.ref]) that nominates a union member, it may begin the lifetime of that union member, as described below...

In an assignment expression of the form E1 = E2 that uses either the built-in assignment operator ([expr.ass]) or a trivial assignment operator ([class.copy.assign]), for each element X of S(E1), if modification of X would have undefined behavior under [basic.life], an object of the type of X is implicitly created in the nominated storage; no initialization is performed and the beginning of its lifetime is sequenced after the value computation of the left and right operands and before the assignment.

Igor Tandetnik
  • 50,461
  • 4
  • 56
  • 85
  • So is there an object there? What is the "original object"? And how can we know what will happen in future? **Time travel?** – curiousguy Oct 31 '19 at 02:05
  • 1
    I feel like you're mistaking an *object* for its *memory*. If you have a union field in some object and that objects lifetime starts, it will **NOT** start the lifetime of the union object fields. Storage is allocated for the union but it hasn't started its lifetime yet. If, at any point, you assign any union field then that objects lifetime will begin. If you set the same field, its previously set fields lifetime will end. If you set a different field, the previously sets field will mistakenly still be *alive*. – Pickle Rick Oct 31 '19 at 02:28
  • Think of every field in the union as being an uninitialized object. Any time a specific field is that union is accessed it's no different than if it were outside of a union. However, accessing one field in the union will not *notify* other types in any way. Finally, when the lifetime of the union ends it will not inherently destroy all the types in the union because, of course, they all share the same memory and can only be one at a time. So in this case you override the union destructor to cleanup the appropriate field. – Pickle Rick Oct 31 '19 at 02:31
  • The absolute simplest way to think about it is that a union is an object of its own where all fields share the same address. Setting 'u.ptr' to something new will initialize that object, setting 'u.ptr' to nullptr will destroy that object, but setting 'u.val' to 0 will not destroy the 'u.ptr' object currently held in that memory space. When ParentObj lifetime ends, it will not destroy anything in the union because it is its own type, but you can use your own destructor for that. – Pickle Rick Oct 31 '19 at 02:36
  • @PickleRick _If, at any point, you assign any union field then that objects lifetime will begin_ No, a new object will be created: _if modification of `X` would have undefined behavior under [basic.life], an object of the type of `X` is implicitly created_ – Language Lawyer Oct 31 '19 at 04:42
  • [Lifetime of a member of a union begins when that member is made active.](https://en.cppreference.com/w/cpp/language/lifetime). – Pickle Rick Oct 31 '19 at 05:29
  • @PickleRick Are you seriously using cppreference as a counterargument to the Standard? ‍♂️ – Language Lawyer Oct 31 '19 at 06:31
  • The issue is that the object seems to be created after the user used its name to refer to it, so an lvalue refers to a future entity. It's pretty weird. – curiousguy Oct 31 '19 at 09:33
  • @curiousguy _The issue is that the object seems to be created after the user used its name to refer to it_ Who told you this? Maybe the object is created before the evaluation of `E1 = E2`. – Language Lawyer Oct 31 '19 at 12:01
  • 1
    @curiousguy There are other cases where a name can refer to storage that doesn't have a live object in it. E.g. `MyClass c; c.~MyClass(); new(&c) MyClass;`. – Igor Tandetnik Oct 31 '19 at 12:35
  • @LanguageLawyer The standard is very explicit about the sequence of events: "...the beginning of its lifetime is sequenced after the value computation of the left and right operands and before the assignment." Until right before the assignment, `E1` only manipulates storage. It's in this limbo of **[basic.life]** where the storage has already been allocated but the lifetime has not yet started. – Igor Tandetnik Oct 31 '19 at 12:44
  • @LanguageLawyer That said, it doesn't matter much in practice. This way of activating a union member only works with a built-in or trivial assignment, which means it only applies to simple types for which there isn't much difference between "has storage" and "is alive". This whole discussion is mostly of academic interest. – Igor Tandetnik Oct 31 '19 at 12:49
  • @LanguageLawyer Sorry, bad form, I'll just quote the remainder of what you shared. "If modification of X would have undefined behavior under [basic.life], an object of the type of X is implicitly created in the nominated storage; no initialization is performed *and the beginning of its lifetime is sequenced after the value computation of the left and right operands and before the assignment*. [ Note: This ends the lifetime of the previously-active member of the union, if any ([basic.life])." – Pickle Rick Oct 31 '19 at 15:30
  • 1
    It should be added that if the active union member has a non trivial destructor when setting the inactive member, you will need to manually ensure the active objects destructor is called. – Pickle Rick Oct 31 '19 at 15:35
  • @IgorTandetnik A "_discussion is mostly of academic interest_" is the diff between sound and unsound formal semantics (or semiformal or any specification). Even if "everybody knows what should happen", as is obviously the case WRT to such simple uses of a union. Note that **historically "everybody knows" was OK** because it was source code=>abs tree=>codegen, **now it isn't, as UB cases are exploited to transform programs.** Compiler writers reject bad codegen bug reports based on "that was always UB" **so we need to agree on the exact definition of things.** That cannot be overstated. – curiousguy Nov 01 '19 at 01:31
  • @IgorTandetnik "_where a name can refer to storage that doesn't have a live object_" *YES* and BTW that other case is equally deeply troubling to me. I don't feel any of this is adequately described (= rigorously described). – curiousguy Nov 01 '19 at 02:08