Is a data race on vptr explicitly illegal?

Question

Before you go further, a note: this is purely a language lawyer question. I wish to get answers based on standard quotes. I am not looking for advice on writing C++ code. Please answer as if I was a compiler writer.

During construction of an object with only exclusive subobjects (#), notably those only non virtual bases (also those with a virtual base class named only once), the dynamic type of an lvalue referring to a base class subobject "increases": it goes from type of the base to type of the class of constructor running.

(#) A subobject is exclusive when it is a direct subobject of exactly one other object (which may be another subobject or a complete object). A member and a non virtual base are always exclusive.

During destruction, the type decreases (until the end of the body of the destructor of that subobject, where the subobject is gone and has no dynamic type anymore).

[During construction of an object with shared base class subobjects (that is in a class with distinct base subobjects with at least a virtual base), the dynamic type of a base subobject can "disappear" temporarily. I'm do not wish to discuss such classes here.]

The real question is: What happens if the dynamic type of the object is increased in another thread?

The title of the question, which is standard C++ question, is expressed using a non standard term (vptr), which may look contradicting. The reasons are:

There is no requirement that polymorphism is implemented in term of vptr, but it (almost?) always is. The one (or many) vptr in an object represent the dynamic type of a polymorphic object.
Data races are defined in term of read/write operations to a memory location.
The standard text often uses non standard elements "for exposition only" to define standard features. (So, why not use the vptr "for exposition only"?)

The standard does not define the behavior of polymorphic objects (*) directly as a function of their dynamic type; the standard specifies which expressions are allowed during the so-called "lifetime" (after the constructor has completed), inside the body of the constructor of the most derived type (exactly the same expressions are allowed with the same semantic), also inside the base class subobject constructors...

(*) Dynamic behavior of polymorphic or dynamic objects(**) include: virtual calls, derived to base conversions, down casts (static_cast or dynamic_cast), typeid of a polymorphic object.

(**) A dynamic object is one such that its class uses the virtual keyword; its constructor is not trivial for that reason.

So the description says: After something has finished, as soon as something started, before something else, etc. some expression is valid and does such and such.

The specification of construction and destruction was written before threads were part of standard C++. So what was the change with the standardization of threads? There is one sentence with defines threading behavior (the normative part) [basic.life]/11:

In this subclause, “before” and “after” refer to the “happens before” relation ([intro.multithread]).

So it's clear that an object is seen as fully constructed iff there is an happen before relation between the completion of the invocation of the constructor and the use of the object, and also an happen before that use of the object and the invocation of the destructor (if it's invoked at all).

But it doesn't say what happens during the construction of derived classes, after a base class subobject has been constructed: obviously there is a race condition if any dynamic property is used for a polymorphic object under construction, but race conditions are not illegal.

[A race condition is a case of non-determinism, and any meaningful use of a mutex, condition variable, rwlocks, many uses of semaphores, many uses of other synchronisation devices, and all uses of atomic primitives introduce a race condition at least at the level of the modification order on the atomic object. Whether that low level non-determinism results on unpredictable high level behavior depends on the way the primitives are used.]

Then the standard draft goes on to say:

[ Note: Therefore, undefined behavior results if an object that is being constructed in one thread is referenced from another thread without adequate synchronization. — end note ]

Where is "adequate synchronization" defined?

Is the lack of "adequate synchronization" the moral equivalent of a regular data race: a data race on the vptr, or in standard speak, a data race on the dynamic type?

For simplicity, I wish to restrict the scope of the question to single inheritance, at least as a first step. (The standard is awfully confused about the construction of objects with multiple inheritance anyway.)

This is language lawyer question so I'm not interested in:

whether using an object that is in the process of being constructed in another thread is advisable (it's probably not advisable);
how to use synchronization to reliably fix that race condition;
whether compiler vendors wish to support such a use case (they probably do not and will not);
whether that could possibly work reliably in any real world implementation (it probably will not reliably work in non trivial cases with current implementation).

EDIT: The previous example, instead of illustrating the issue, was a distraction. It caused a very interesting but completely irrelevant discussion in the chat section.

Here is a cleaner example that will not cause the same issue:

atomic<Base1*> shared;

struct Base1 {
  virtual void f() {}
};

struct Base2 : Base1 {
  virtual void f() {}
  Base2 () { shared = (Base1*)this; }
};

struct Der2 : Base2 {
  virtual void f() {}
};

void use_shared() {
  Base1 *p;
  while (! (p = shared.get()));
  p->f();
}

With the consumer/producer logic:

Thread A: new Der2;
Thread B: use_shared();

For reference, original example:

atomic<Base*> shared;

struct Base {
  virtual void f() {}
  Base () { shared = this; }
};

struct Der : Base {
  virtual void f() {}
};

void use_shared() {
  Base *p;
  while (! (p = shared.get()));
  p->f();
}

Consumer/producer logic:

Thread A: new Der;
Thread B: use_shared();

It wasn't clear that this could be used by another thread during the execution of Base constructor, which is an interesting issue but irrelevant to the issue of using a base class subobject while a derived constructor runs in another thread.

Additional information

For reference, the DR that "motivated" the current phrasing (although that explains nothing):

Core Language Defect Report #710

"*What happens if the dynamic type of the object is increased in another thread?*" How could that happen? You can't shift half of a constructor to another thread. Constructor completion is static and will happen on the thread that started the construction. — Nicol Bolas, Jun 30 '18 at 19:30
@NicolBolas The object is constructed in one thread and its dynamic features are used in another one. — curiousguy, Jun 30 '18 at 19:35
"I'm not interested in [list of all things that address this problem]" is pretty arrogant. What *exactly* are you looking for here? This question could probably be boiled down to one paragraph if you tried. — tadman, Jun 30 '18 at 19:41
@tadman It's a [tag:language-lawyer] question: "For questions **about the intricacies of formal or authoritative specifications** of programming languages and environments." a If by "arrogant" you mean that you aren't interested in questions regarding "the intricacies of formal or authoritative specifications", well, the solution is to ignore such questions. "list of all things that address this problem" no **they do not address that problem**. The problem relates to what the standard says on such case, not how to avoid the case. — curiousguy, Jun 30 '18 at 19:44
Objection! The question doesn't clearly state the problem. Pushing that new code snippet to the top and deleting 75% of your mostly redundant notes would make this question significantly more focused and would help you get the answer you're looking for. I'm sure that research was useful to you personally, but it impedes understanding of your question, there's just too much to try and make sense of. — tadman, Jun 30 '18 at 19:47
One thing to note is that in your example you don't show how thread B is synchronized with A so that the `use_shared` function is called only after `new Der` is complete. This could be solved with a lazy initializer that has a mutex on your shared instance. — tadman, Jun 30 '18 at 19:48
@tadman "_deleting 75% of your mostly redundant notes_" The "redundant notes" explain in detail the problem. You haven't read them. The problem is understanding what the standard says about such code, **not fixing the code**. "_you don't show how thread B is synchronized with A_" That's the whole point: they are not. They run concurrently so **there is a race condition.** That's explained in the "redundant notes". — curiousguy, Jun 30 '18 at 19:50
I've read them, but they just go into the weeds pretty fast. Speak through your code first, your remarks second, and notes *where necessary*. Yes, there is a race condition. Do you want to fix it? This has inadequate synchronization unless you think that spin loop is a good idea, which it really isn't. Pinning a core because you aren't willing to use a mutex isn't good programming. — tadman, Jun 30 '18 at 19:53
@tadman It's an **example** of the problem. I am *not* recommending such practices. Do you want me to make the example code longer and more verbose just to avoid the spinning? It would be easy, I just don't see the point: the code illustrates the problem. Also, the race would probably much less racy if one thread was sleeping, don't you think? — curiousguy, Jun 30 '18 at 19:56
You can properly synchronize things here by ensuring *any* thread can initialize the object if it's the first to call that `get()` function, just use a mutex to prevent any double-initialization race conditions, or you can have that object initialized before any other threads are created. Both of those are adequate synchronization. That code doesn't have a race condition so much as it's just extremely inefficient and sloppy. — tadman, Jun 30 '18 at 19:58
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/174093/discussion-between-curiousguy-and-tadman). — curiousguy, Jun 30 '18 at 20:00

score 3 · Answer 1 · edited Jun 20 '20 at 09:12

3

My reading of the standard is that there's a data race and therefore undefined behavior, but the standard addresses it very indirectly.

[basic.life]/1 The lifetime of an object of type T begins when ... its initialization is complete.

When shared = this; is executed, the lifetime of Base object, let alone Der, hasn't started yet.

[basic.life]/6 Before the lifetime of an object has started but after the storage which the object will occupy has been allocated ... any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways. For an object under construction or destruction, see [class.cdtor]. Otherwise ... [t]he program has undefined behavior if ... the pointer is used to access a non-static data member or call a non-static member function of the object.

[basic.life]/11 In this section, “before” and “after” refer to the “happens before” relation (4.7). [ Note: Therefore, undefined behavior results if an object that is being constructed in one thread is referenced from another thread without adequate synchronization. —end note ]

So the default position of [basic.life] is that a call to an object's method that doesn't happen-after its initialization is completed exhibits undefined behavior. But [class.cdtor] may have more to say.

[class.cdtor]/3 Member functions, including virtual functions (13.3), can be called during construction or destruction (15.6.2). When a virtual function is called directly or indirectly from a constructor or from a destructor ...

Thus, [class.cdtor] only addresses the case where the virtual function is called directly or indirectly from the constructor (necessarily on the same thread on which the constructor itself runs). It's silent on the case where a method is called from another thread, as in the example. I take it to mean that [basic.life] controls, and the behavior of the example is undefined.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 30 '18 at 21:21

Igor Tandetnik

50,461
4
56
85

"Called indirectly" means constructor calls, say, a free function passing `this`, and the function turns around and calls the method through that pointer. That necessarily has to happen on the same thread. I can't think of any situation where a) a method is called on the same thread where a constructor is currently executing, but b) it's not called directly or indirectly from that constructor (that is, the constructor is not on the call stack of that thread, somehow); that may be a lack of imagination on my part. – Igor Tandetnik Jun 30 '18 at 21:33
Inside the body of the constructor of your class, `mem` is already fully constructed, so the issue fails to arise. – Igor Tandetnik Jun 30 '18 at 21:45
**[class.base.init]** describes the order of initialization of various sub-objects of the object in detail. – Igor Tandetnik Jun 30 '18 at 21:50
1

Answering your question on whether standard library functions may use parallel algorithms internally: **[res.on.data.races]/8** Unless otherwise specified, C++ standard library functions shall perform all operations solely within the current thread if those operations have effects that are visible (4.7) to users. – Igor Tandetnik Jun 30 '18 at 21:51
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/174098/discussion-between-curiousguy-and-igor-tandetnik). – curiousguy Jun 30 '18 at 21:56
That is answer raises very interesting points about the example that was intended to illustrate the question, but doesn't address the question which is about the race condition on the vptr. – curiousguy Jul 02 '18 at 08:19
1

You want a language-lawyerly answer for a question that mentions a term appearing nowhere in the standard. The standard says, in the end (by my reading at least) - you can call virtual methods from inside the constructor, or after the constructor of the most-derived object is completed. An attempt to call a virtual method in parallel from another thread while the constructor is still running exhibits undefined behavior. This is likely inspired by a typical implementation's need to adjust vptr during construction, but is formulated in a way that avoids mentioning vptr by name. – Igor Tandetnik Jul 02 '18 at 12:27
For additional discussion, see [DR#710 Data races during construction](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#710). Though it doesn't really explain much. – Igor Tandetnik Jul 02 '18 at 12:28
"_An attempt to call a virtual method in parallel from another thread while the constructor is still running exhibits undefined behavior_" This is extremely problematic if true. It means that any use of a parallel algorithm is problematic inside a constructor, even from the most derived class constructor. It means that any library must document whether is uses threads in any function. Which library does that? (except std lib) – curiousguy Jul 02 '18 at 14:31
"_question that mentions a term appearing nowhere in the standard_" A term that doesn't appear but that's unwritten in that proposed modification in a DR you link to: "A constructor for a class with virtual functions or virtual base classes **modifies a memory location in the object**". A memory location in the object modified by constructors: that's what normal people call a vptr! – curiousguy Jul 02 '18 at 14:45
1

DRs are often written from the implementor's point of view, often by the compiler implementors themselves. The standard doesn't live in the abstract - it must be written so that it's actually implementable; and moreover, since implementation often precedes standardization, in a way that comports, to the greatest extent possible, with existing implementation practice, so as not to suddenly render existing compilers non-conforming. Major compiler developers are, of course, represented on the standardization committee; and are, of course, very well aware of the care and feeding of vtables. – Igor Tandetnik Jul 02 '18 at 15:01
1

*"any library must document whether is uses threads in any function"* Not really. It must only document whether it calls back into the user code from another thread. Which would be a good idea anyway - one doesn't normally write every function under assumption that it may be called concurrently from arbitrary threads at any time without notice. Do you routinely synchronize access to every data member in every member function, just in case some library you use unexpectedly calls the method on its internal thread? – Igor Tandetnik Jul 02 '18 at 15:04
"_Not really_" Yes: **according to you**, `this` isn't usable from another thread! "_an object's method that doesn't happen-after its initialization is completed exhibits undefined behavior_" and "_(necessarily on the same thread on which the constructor itself runs)_" – curiousguy Jul 02 '18 at 18:23
"Uses threads in any function" != "touches `this` from another thread". If it does touch `this` from another thread, it better document that. Having your object called on a thread you didn't expect it being called on is problematic whether or not the object's constructor has completed. Most functions aren't written to be safe to call concurrently. – Igor Tandetnik Jul 02 '18 at 21:32
Except of the case of `this` in a constructor passed explicitly or implicitly to a library function, is there any case where you need to know that f.ex. a pointer to member might be used in another thread? The constructor is a pathological, unique case, according to your reading of the std. No other function ever has the same risk of UB, and no existing compiler will ever have unpredictable behavior because **the vptr acts at worst like a data member**. – curiousguy Jul 03 '18 at 00:35
(...) and some ppl in the committee wanted the dynamic type to work like an atomic, which is probably not even doable without a complete change to vptr use, possibly even a mutex on all virtual calls! – curiousguy Jul 03 '18 at 00:36
You yourself showed examples where decisions were made based on thread ID - those decisions will be wrong when called on unexpected thread. A method could access thread-local variables. A method is likely to access or modify a data member, and could access or modify a global variable - if such a method is called concurrently from multiple threads, it'll exhibit a data race where there was none before. Constructor or no constructor, calling a function concurrently that wasn't specifically written to be called concurrently is undefined behavior just waiting to happen. – Igor Tandetnik Jul 03 '18 at 01:53
Obviously, parallelizing callbacks without permission is not allowed by the (implicit) contract of any decent library. There no question about that (there is that other question about whether std lib formally has the guarantee though). That wasn't my point! The point is that any use of the object `*this`, not just callbacks, is UB (according to you) in other threads during construction. That interpretation is problematic, to say the least. – curiousguy Jul 03 '18 at 17:30
What kind of "use" do you envision the library put `*this` to, if not calling member functions on it? Do you have any particular use case in mind? Any specific example that would cause problems? Further, I didn't say *any* use is prohibited - only those uses explicitly enumerated in **[basic.life]/6**. The standard states: *"any pointer that represents the address of the storage location where the object will be or was located **may** be used but only in **limited** ways."* Emphasis mine. Can you show a concrete example where those limited ways are too limiting? – Igor Tandetnik Jul 03 '18 at 18:14
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/174284/discussion-between-curiousguy-and-igor-tandetnik). – curiousguy Jul 03 '18 at 18:20

Is a data race on vptr explicitly illegal?

Additional information

1 Answers1