How to access an object's storage through an aggregate

Question

In "Lvalues and rvalues", [basic.lval] (3.10), the C++ standard contains a list of types such that it is valid to "access the stored value of an object" through a glvalue of such a type (paragraph 10). Specifically, it says:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

the dynamic type of the object,

[some unimportant details about CV and signed/unsigned]

an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

[some more stuff]

What exactly does the "aggregate" rule mean? How do I access an object's stored value through a glvalue of some general aggregate type?!

I'm picturing something like this:

int a = 10;                                      // my "stored value"

struct Foo { char x; float y; int z; bool w; };  // an aggregate

reinterpret_cast<Foo&>(a).y = 0;                 // ???

Doesn't the final cast produce a glvalue of "an aggregate type that includes the dynamic type of a", and thus make this valid?

I could be wrong about this, but I think the rule given above is a *necessary* condition to avoid UB rather than a *sufficient* condition. What you're doing above almost certainly breaks some other rule and so would be UB. :-) — templatetypedef, Nov 28 '13 at 22:04
@templatetypedef: Probably, but I couldn't see anything that wasn't already covered by the other rules... — Kerrek SB, Nov 28 '13 at 22:21
Might this refer to [class.mem]/18? In C++03, this was still defined in terms of POD, where a POD was an aggregate. — dyp, Dec 18 '13 at 15:24

score 7 · Answer 1 · answered Dec 22 '13 at 23:51

7

The intent of that list is not to provide you alternate methods to access an object, but rather as the footnote to the list indicates, to list all the ways an object might be aliased. Consider the following example:

struct foo
{
    char x; 
    float y; 
    int z; 
    bool w;
};

void func( foo &F, int &I, double &D )
{
    //...
}

What that list is saying is that accesses to F may also access the same underlying object as accesses to I. This could happen if you passed a reference to F.z in for I, like this:

func(F, F.z, D);

On the other hand, you can safely assume no access to F accesses the same underlying object as D, because struct foo does not contain any members of type double.

That's true even if some joker does this:

union onion
{
    struct foo F;
    double D;
};

onion o; 
int i;

func( o.F, i, o.D );  // [class.union] (9.5) wants a word with you.  UB.

I'm not sure that the union was central to your question. But the part before the union example highlights why the aggregate rule exists.

Now let's consider your example: reinterpret_cast<Foo&>(a).y = 0; [expr.reinterpret.cast] (5.2.10), paragraph 11 has this to say:

An lvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. That is, a reference cast reinterpret_cast<T&>(x) has the same effect as the conversion *reinterpret_cast<T*>(&x) with the built-in & and * operators (and similarly for reinterpret_cast<T&&>(x)). The result refers to the same object as the source lvalue, but with a different type. The result is an lvalue for an lvalue reference type or an rvalue reference to function type and an xvalue for an rvalue reference to object type. No temporary is created, no copy is made, and constructors (12.1) or conversion functions (12.3) are not called.⁷¹

^{⁷¹ This is sometimes referred to as a type pun.}

In the context of your example, it's saying that if it's legal to convert a pointer-to-int to a pointer-to-Foo, then your reinterpret_cast<Foo&)(a) is legal and produces an lvalue. (Paragraph 1 tells us it will be an lvalue.) And, as I read it, that pointer conversion is itself OK, according to paragraph 7:

A pointer to an object can be explicitly converted to a pointer to a different object type. When a prvalue v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1. Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified.

You have standard-layout types with compatible alignment constraints. So, what you have there is a type pun that yields an lvalue. The rule you listed does not on its own make it undefined.

So what might make it undefined? Well, for one, [class.mem] (9.2) paragraph 21 reminds us that a pointer to a standard layout struct object points to its initial member, and vice versa. And so, after your type pun, you're left with a reference to Foo, such that Foo's x is at the same location as a.

And... this is where my language lawyering peters out. I know in my gut that accessing Foo through that franken-reference is at best implementation defined or unspecified. I can't find where it's explicitly banished to the realm of undefined behavior.

But, I think I answered your original question: Why is the aggregate rule there? It gives you a very basic way to rule on potential aliases without further pointer analysis.

answered Dec 22 '13 at 23:51

Joe Z

17,413
3
28
39

Thanks a lot for this detailed treatment! One question, though: If I added a virtual function to the first `foo` (and thus made it not an aggregate), how would that change the argument? – Kerrek SB Dec 23 '13 at 00:53
@KerrekSB: Well, at that point, it's no longer a standard layout class. The last sentence of 5.2.10 paragraph 7 seems to apply: "The result of any other such pointer conversion is unspecified." Furthermore, you lose the guarantee in 9.2 that the pointer to the class is also a pointer to its first member. (In all likelihood, the vtbl pointer is at the top, hiding from you.) You do still get the guarantee that a pointer to `Foo` points to its first byte, though. – Joe Z Dec 23 '13 at 01:17
Yes, I understand that, but does it mean that `func(F, F.z, D);` no longer accesses `F`? – Kerrek SB Dec 23 '13 at 01:18
@KerrekSB: Ok, carefully parsing that paragraph, I would lean toward the ruling that a "reference/pointer to `Foo`" would never be construed to potentially alias a "reference/pointer to `int`" in that case. I'm going to guess that the aggregate rule is there to bridge back to C. That said, in researching this question, I found many places (including IBM's XL compiler docs) that lump all classes, structs and unions under the "aggregate" banner. – Joe Z Dec 23 '13 at 03:14
@KerrekSB: Reading between the lines here— http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_closed.html#584 —it appears that they've very carefully chosen to leave non-aggregate classes out of the statement you highlighted in 3.10. Notice how they purposefully keep non-aggregate unions, specifically. Or maybe I'm reading too much into that ruling? – Joe Z Dec 23 '13 at 03:35
I don't know, that's why I've asked :-) It's an interesting idea that you're not allowed to alias into non-aggregate class members, but that just feels totally wrong. Relying on such absence of aliasing would seem to break tons of common code. – Kerrek SB Dec 23 '13 at 12:53
@KerrekSB : I agree. I tried to tempt G++ to take the no-alias bait, and it didn't, even with `-O3 -fstrict-aliasing`. It was applying the aggregate rule even to non-aggregate classes. – Joe Z Dec 23 '13 at 16:18
@KerrekSB: I finally found a compiler and some code that would take the bait. Yow. But, I had to coax it with a flag that says "absolutely strict aliasing". I'm not certain it proves anything, but I thought it was interesting. You can see the code and output here: http://spatula-city.org/~im14u2c/stackoverflow/3-10/ In the "aggressive" version, you can see that the FADDDP just does "ret += ret" because it adds `B5:B4` to `B5:B4`, while in the "conservative" version it reloads `dbl` and adds the reloaded value to `ret`. (It's TMS320C6600 assembly, if you're curious.) – Joe Z Dec 25 '13 at 03:30
(Some C66x notes: It's an exposed delay slot architecture, so a `RET` takes effect 6 cycles after it's issued, hence the `; BRANCH OCCURS` comment in the compiler output. `LDDW` takes 5 cycles and `FADDDP` takes 4. `DADD 0, B5:B4, A5:A4` moves a value from `B5:B4` to `A5:A4`, which is also the register pair return values go in.) – Joe Z Dec 25 '13 at 03:34
That's amazing, thanks! What I find most surprising is that you are essentially forbidden from binding general class members to references. I had never heard of this rule before. – Kerrek SB Dec 25 '13 at 13:54
The `onion` example in this answer is interesting ("... even if some joker does this ..."), but I suspect that there would be one case where the compiler would be required to accept that a `double&` could alias a `foo&`: specifically, imagine if `func` was modified to take another parameter, of type `onion&`. This would be called with `func( o.F, i, o.D, o );` in this example. Therefore, within `func`, the compiler would be forced to see that all these reference types can alias each other. Am I correct in this? – Aaron McDaid Sep 26 '16 at 18:27
@AaronMcDaid: If `u` is an object of union type with members `m1` and `m2`, I think it's pretty clear the authors of the Standard didn't want to allow code to take form references to both members and use them interchangeably, but I don't think the authors of the Standard thought using a reference to `u.m1` to access the storage associated with `u`, and later using a reference to `u.m2` likewise, should cause any problem *in cases where the two references don't exist simultaneously*. Unfortunately, so far as I can tell, no normative text in the Standard distinguishes those patterns. – supercat Nov 08 '18 at 18:28
@AaronMcDaid: As it is, the Standard doesn't specify *any* situation where code could form a reference to a union member of non-character type and then access the union via that reference, but instead relies upon compiler writers to handle useful cases when practical without regard for whether the Standard mandates them. – supercat Nov 08 '18 at 18:39
@JoeZ `So what might make it undefined?` - isn't the same mentioned in para7 itself, namely - *Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. **The result of any other such pointer conversion is unspecified.***, so shouldn't you cast it back to original type? – Cheshar Jun 07 '20 at 20:47

score 2 · Answer 2 · answered Nov 28 '13 at 22:12

2

The item of the clause just refers to the normal access to members of any aggregate (struct, class, or array) or union: You need to be able to access the stored values of objects without causing undefined behavior. The clause only states necessary conditions: at least one of the items has to be true. It doesn't state sufficient conditions, i.e., in addition to these conditions other conditions may need to hold, too.

answered Nov 28 '13 at 22:12

Dietmar Kühl

150,225
13
225
380

But if I just had `Foo b`, then `b.y` would just be an ordinary lvalue of type `int` and be covered by the first rule, so why do we need this extra rule? – Kerrek SB Nov 28 '13 at 22:15
If your glvalue is of type `Foo`, the first item covers access to entities with type `Foo` ("the dynamic type of the object" with "the object" refering to the glvalue), i.e., the entire object. The extra clause states that you can also access subobjects. – Dietmar Kühl Nov 28 '13 at 22:23
Then why does it say "aggregate or union"? How do you access non-aggregate members? – Kerrek SB Nov 28 '13 at 22:39
What should have been used? "member" doesn't work, for example, because it doesn't apply to arrays (they don't have members but elements). ... and a `union` isn't an aggregate but you should still be able to access the object currently stored in a `union`. – Dietmar Kühl Nov 28 '13 at 22:47
I meant, what about members of non-aggregate classes? – Kerrek SB Nov 28 '13 at 23:05
Hm. I see your point: not all classes are aggregates and there isn't a clause indicating how their members are accessed... – Dietmar Kühl Nov 28 '13 at 23:16

How to access an object's storage through an aggregate

2 Answers2

Linked