18

I have (re?)invented this approach to zero-cost properties with data member syntax. By this I mean that the user can write:

some_struct.some_member = var;
var = some_struct.some_member;

and these member accesses redirect to member functions with zero overhead.

While initial tests show that the approach does work in practice, I'm far from sure that it is free from undefined behaviour. Here's the simplified code that illustrates the approach:

template <class Owner, class Type, Type& (Owner::*accessor)()>
struct property {
    operator Type&() {
        Owner* optr = reinterpret_cast<Owner*>(this);
        return (optr->*accessor)();
    }
    Type& operator= (const Type& t) {
        Owner* optr = reinterpret_cast<Owner*>(this);
        return (optr->*accessor)() = t;
    }
};

union Point
{
    int& get_x() { return xy[0]; }
    int& get_y() { return xy[1]; }
    std::array<int, 2> xy;
    property<Point, int, &Point::get_x> x;
    property<Point, int, &Point::get_y> y;
};

The test driver demonstrates that the approach works and it is indeed zero-cost (properties occupy no additional memory):

int main()
{
    Point m;
    m.x = 42;
    m.y = -1;

    std::cout << m.xy[0] << " " << m.xy[1] << "\n";
    std::cout << sizeof(m) << " " << sizeof(m.x) << "\n";
}

Real code is a bit more complicated but the gist of the approach is here. It is based on using a union of real data (xy in this example) and empty property objects. (Real data must be a standard layout class for this to work).

The union is needed because otherwise properties needlessly occupy memory, despite being empty.

Why do I think there's no UB here? The standard permits accessing the common initial sequence of standard-layout union members. Here, the common initial sequence is empty. Data members of x and y are not accessed at all, as there are no data members. My reading of the standard indicate that this is allowed. reinterpret_cast should be OK because we are casting a union member to its containing union, and these are pointer-interconvertible.

Is this indeed allowed by the standard, or I'm missing some UB here?

timrau
  • 22,578
  • 4
  • 51
  • 64
n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • 1
    I think there is no UB, at least not with c++11 and later. However I would not make Point a union, but only place the data member(s) and the corresponding properties into an anonymous union inside Point. Then use reinterpret_cast in the properties to cast to the data member (not to the class Point). This way you can inherit from Point and the approach probably scales better since you (or child classes) can place more than one anonymous union inside the class. – Andreas H. Feb 10 '19 at 14:08
  • @AndreasH. I'm doing exactly what you suggest in real code, however it makes things more complicated. I have simplified it for presentation purposes. – n. m. could be an AI Feb 10 '19 at 14:13
  • Doesn't pointer-interconvertibility imply an object to be alive to change the pointer value to point to it? Or this is only required by `std::launder`? – Language Lawyer Feb 10 '19 at 14:25
  • 2
    The only potential for UB I can think of is [class.mfct.non-static]/2. The object is inactive when it's member function is called. – Passer By Feb 10 '19 at 14:28
  • @LanguageLawyer No, you can acquire pointers to inactive objects of the same union. – Passer By Feb 10 '19 at 14:28
  • @PasserBy but it's still an object of the correct type, although inactive. – n. m. could be an AI Feb 10 '19 at 14:37
  • @LanguageLawyer the standard says "A union object and its non-static data members are pointer-interconvertible", although only in a note. It doesn't say "A union object and its active member..." In general one needs a pointer to a member in order to make that member active, so it should be possible to obtain a pointer to an inactive member. – n. m. could be an AI Feb 10 '19 at 14:41
  • @n.m. «there *is* an object b ... that is pointer-interconvertible with a» in [expr.static.cast]/13 makes me wonder, can we say that an object «is» when it is not alive. _In general one needs a pointer to a member in order to make that member active_ But one doesn't need pointer-interconvertibility to get such pointer. – Language Lawyer Feb 10 '19 at 14:50
  • 1
    @LanguageLawyer "A union object and its non-static data members are pointer-interconvertible" is more than enough for me. If you think this statement doesn't really guarantee interconvertibility for *all* members, as opposed to only the active member, you are welcome to file a defect report. – n. m. could be an AI Feb 10 '19 at 15:39
  • @n.m. what if I don't think this is a defect? – Language Lawyer Feb 10 '19 at 16:14
  • @LanguageLawyer Don't submit a report then. – n. m. could be an AI Feb 10 '19 at 16:16
  • Fwiw, I've been here with my own _Really Clever Design (R) (TM)_ that also exploited `union`s, and after entrenching it in my program, discovered that it was UB for the same reason. That was a fun rewrite... (I mean, in totality, it was, because I ended up with code that was better and more flexible for other reasons - but I didn't like being rushed stressfully into it!) – underscore_d Feb 10 '19 at 23:48
  • _"A union object and its non-static data members are pointer-interconvertible" is more than enough for me. If you think this statement doesn't really guarantee interconvertibility for all members, as opposed to only the active member_ M-m-m-kay. If pointer-interconvertibility doesn't care about activity of members, which member subobject I'm interconvertible with in `union U { char a; char b; } u {}; reinterpret_cast(&u);`? – Language Lawyer Sep 06 '20 at 21:33
  • @LanguageLawyer I don't know, this looks like a defect in the standard to me. – n. m. could be an AI Sep 06 '20 at 22:17

2 Answers2

14

TL;DR This is UB.

[basic.life]

Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways. For an object under construction or destruction, see [class.cdtor]. Otherwise, such a glvalue refers to allocated storage, and using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if: [...]

  • the glvalue is used to call a non-static member function of the object, or

By definition, an inactive member of an union isn't within its lifetime.


A possible workaround is to use C++20 [[no_unique_address]]

struct Point
{
    int& get_x() { return xy[0]; }
    int& get_y() { return xy[1]; }
    [[no_unique_address]] property<Point, int, &Point::get_x> x;
    [[no_unique_address]] property<Point, int, &Point::get_y> y;
    std::array<int, 2> xy;
};

static_assert(offsetof(Point, x) == 0 && offsetof(Point, y) == 0);
Community
  • 1
  • 1
Passer By
  • 19,325
  • 6
  • 49
  • 96
  • oh, so the permission to examine members of inactive objects does not extend to member fumctions. This is unfortunate and looks like a defect to me. – n. m. could be an AI Feb 10 '19 at 15:51
  • @n.m. I'm surprised as well, didn't think [basic.life] would outright ban such usage. Particularly so since calling through a null pointer is arguably well-defined. – Passer By Feb 10 '19 at 15:54
  • @n.m.: Why is that a defect? It makes sense; the common initial sequence rule is about reading a value created through a different union member. It's not about allowing you to use unions in whatever way you want. Unions are supposed to have *only one* active member; talking to an inactive member is *supposed* to be wrong. The common initial sequence rule just specifies a specific case where it's OK to read a piece of data written through the active member. – Nicol Bolas Feb 10 '19 at 15:54
  • @NicolBolas Any access to a data member should be possiblebto encapsulate in a member function. One is allowed to access x.y but not x.y_() which in turn only accesses x.y. This doesn't look right. What the rationale for disallowing x.y_() would be? – n. m. could be an AI Feb 10 '19 at 16:00
  • 1
    @n.m.: Because it doesn't make sense. You're allowed to access `x.y` because the compiler can clearly see that you're accessing a specific member variable. The scope of your action is bounded, and it is clear to all what the state of things is. Calling a member function could do *anything* (as evidenced by this very example, where you reach out into some other object to get the reference). The scope of the action is unbounded. And personally, I would say that allowing it makes a mockery of the object model. – Nicol Bolas Feb 10 '19 at 16:03
  • It looks like `[[no_unique_address]]` will solve the problem indeed. One can use offsetof to calculate the reverse offset too, instead of asserting it's zero. – n. m. could be an AI Feb 10 '19 at 16:04
  • 2
    @n.m.: The annoying part of the `no_unique_address` solution is that you would naturally want to make the actual members *private* while leaving the "properties" public, but doing so breaks standard layout. And if you break standard layout, there is a much better chance that the layout of the type will be disturbed by the presence of `no_unique_address` members (not to mention breaking `offsetof`. Which is why I think that "attribute" should have been a keyword with actual behavior behind it, not merely a suggestion. – Nicol Bolas Feb 10 '19 at 16:11
  • @NicolBolas calling a member function passing the offending pointer as `this` isn't very much different from calling a non-member function passing that same pointer as any old argument. The latter is however allowed, while the former is not. – n. m. could be an AI Feb 10 '19 at 16:13
  • @n.m.: You're confusing mechanism with intent. Calling a member function is mechanically similar to calling a non-member function with the same `this` pointer. But the *intent* behind these things is altogether different. If you call a member function of an object, that *means something*, something which is fundamentally different from passing any old parameter to a non-member function. That meaning is why we bother to have member functions at all. – Nicol Bolas Feb 10 '19 at 16:18
  • @NicolBolas offsetof is conditionally supported for non-standard-layout types since c++17. There's no reason why it wouldn't be supported in most implementations. – n. m. could be an AI Feb 10 '19 at 16:21
  • @NicolBolas um, no. This is entirely not true. If I call a member function, it's because I want it to perform a certain action, not because I want to make some kind of deep philosophical statement. I wouldn't bother having any non-virtual member functions if I could. Everything could have been expressed as friend functions instead. There's neither technical nor philosophical reason to allow certain kinds of operations as members only, but purely an aesthetical one. `friend operator=(myclass&, const myclass&)` isn't any worse than the standard form and confers the intent just as well. Oh well.. – n. m. could be an AI Feb 10 '19 at 16:36
  • 1
    @n.m.: "*If I call a member function, it's because I want it to perform a certain action, not because I want to make some kind of deep philosophical statement.*" But if you *write* a function as a member, you *are* making a "deep philosophical statement" about the relationship between that function and the object it is a member of. That you *personally* don't care about that "philosophical statement" doesn't mean it isn't there. This is part of why unified function call syntax is non-workable. – Nicol Bolas Feb 10 '19 at 17:45
  • @NicolBolas no, I don't. Please read what I wrote again. I write a member function only when the language leaves me no choice. I don't see any non-cosmetic difference between member and non-member functions. If you do, fine, but don't try to impose your view on me. Or at least try to explain first how you decide to implement, say, `operator+` as a member or as a non-member. Why does the language allow both choices anyway? – n. m. could be an AI Feb 10 '19 at 18:00
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/188190/discussion-between-nicol-bolas-and-n-m). – Nicol Bolas Feb 10 '19 at 18:02
6

Here is what the common-initial-sequence rule says about unions:

In a standard-layout union with an active member of struct type T1, it is permitted to read a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2; the behavior is as if the corresponding member of T1 were nominated.

Your code does not qualify. Why? Because you are not reading from "another union member". You are doing m.x = 42;. That isn't reading; that's calling a member function of another union member.

So it doesn't qualify for the common initial sequence rule. And without the common-initial-sequence rule to protect you, accessing non-active members of the union is UB.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • unfortunately calling a (non-virtual) member function is indeed an action that is separately disallowed for objects out of their lifetime; as I said it probably shouldn't be, but there's little we can do. – n. m. could be an AI Feb 10 '19 at 16:09