Implementation of Double Checked Locking in C++ 98/03 using volatile

Question

Reading this article about Double Checked Locking Pattern in C++, I reached the place (page 10) where the authors demonstrate one of the attempts to implement DCLP "correctly" using volatile variables:

class Singleton {
public:
  static volatile Singleton* volatile instance();

private:
  static volatile Singleton* volatile pInstance;
};

// from the implementation file
volatile Singleton* volatile Singleton::pInstance = 0;
volatile Singleton* volatile Singleton::instance() {
  if (pInstance == 0) {
    Lock lock;
    if (pInstance == 0) {
      volatile Singleton* volatile temp = new Singleton;
      pInstance = temp;
    }
  }
  return pInstance;
}

After such example there is a text snippet that I don't understand:

First, the Standard’s constraints on observable behavior are only for an abstract machine defined by the Standard, and that abstract machine has no notion of multiple threads of execution. As a result, though the Standard prevents compilers from reordering reads and writes to volatile data within a thread, it imposes no constraints at all on such reorderings across threads. At least that’s how most compiler implementers interpret things. As a result, in practice, many compilers may generate thread-unsafe code from the source above.

and later:

... C++’s abstract machine is single-threaded, and C++ compilers may choose to generate thread-unsafe code from source like the above, anyway.

These remarks are related to the execution on the uni-processor, so it's definitely not about cache-coherence issues.

If the compiler can't reorder reads and writes to volatile data within a thread, how can it reorder reads and writes across threads for this particular example thus generating thread-unsafe code?

That article is from 2004, thus predating C++11 and its updated abstract machine by a fair number of years. — molbdnilo, Sep 12 '16 at 18:05
@molbdnilo Ok, but how can the statement that compiler can reorder reads and writes _across_ threads be explained? — undermind, Sep 12 '16 at 18:08

score 2 · Answer 1 · answered Sep 12 '16 at 22:08

The pointer to the Singleton may be volatile, but the data within the singleton is not.

Imagine Singleton has int x, y, z; as members, set to 15, 16, 17 in the constructor for some reason.

  volatile Singleton* volatile temp = new Singleton;
  pInstance = temp;

OK, temp is written before pInstance. When are x,y,z written relative to those? Before? After? You don't know. They aren't volatile, so they don't need to be ordered relative to the volatile ordering.

Now a thread comes in and sees:

if (pInstance == 0) {  // first check

And let's say pInstance has been set, is not null. What are the values of x,y,z? Even though new Singleton has been called, and the constructor has "run", you don't know whether the operations that set x,y,z have run or not.

So now your code goes and reads x,y,z and crashes, because it was really expecting 15,16,17, not random data.

Oh wait, pInstance is a volatile pointer to volatile data! So x,y,z is volatile right? Right? And thus ordered with pInstance and temp. Aha!

Almost. Any reads from *pInstance will be volatile, but the construction via new Singleton was not volatile. So the initial writes to x,y,z were not ordered. :-(

So you could, maybe, make the members volatile int x, y, z; OK. However...

C++ now has a memory model, even if it didn't when the article was written. Under the current rules, volatile does not prevent data races. volatile has nothing to do with threads. The program is UB. Cats and Dogs living together.

Also, although this is pushing the limits of the standard (ie it gets vague as to what volatile really means), an all-knowing, all-seeing, full-program-optimizing compiler could look at your uses of volatile and say "no, those volatiles don't actually connect to any IO memory addressses etc, they really aren't observable behaviour, I'm just going to make them non-volatile"...

I'm sorry, actually, I omitted a part of the article I'm referring to in the question (I believed it would make the question clearer). In that part authors address the problem you're describing. Initialization of the Singleton's fields in the constructor is made in the following way: `static_cast(x) = 5;` but even after this authors still claim that such solution doesn't solve the problem with the generation of the thread-unsafe code. I know that modern C++ has an elaborated memory model, but it was interesting what did authors mean. — undermind, Sep 12 '16 at 22:24
OK, I've now (re-)read the article (it has been a few years). I'm tempted to say that the bit you are quoting, and the end of a section leading to the next section, really doesn't have much weight. (Also, for example, cache-coherency isn't usually the problem with multi-processors (since they all (ie all common architectures) ensure cache-coherency), the problem is with the read/write buffers. So it is a good article, but not necessarily perfect.) Anyhow, maybe tons of volatile + uniprocessor could actually be thread safe. But `volatile` is poorly defined, so I wouldn't count on it. — tony, Sep 14 '16 at 05:56

score 1 · Answer 2 · answered Sep 12 '16 at 17:30

1

I think they're referring to the cache coherency problem discussed in section 6 ("DCLP on Multiprocessor Machines". With a multiprocessor system, the processor/cache hardware may write out the value for pInstance before the values are written out for the allocated Singleton. This can cause a 2nd CPU to see the non-NULL pInstance before it can see the data it points to.

This requires a hardware fence instruction to ensure all the memory is updated before other CPUs in the system can see any of it.

answered Sep 12 '16 at 17:30

1201ProgramAlarm

32,384
7
42
56

It doesn't really seem so, because they make a remark at the end of the section 5 suggesting that the multiprocessor execution may introduce even more problems, besides this, section 6 is explicitly named _DCLP on multiprocessor machines_ and in the section 1 _Inroduction_ we can read: "This article explains why Singleton isn’t thread safe, how DCLP attempts to address that problem, why DCLP may fail on **both uni- and multiprocessor** architectures, and why you can’t (portably) do anything about it." – undermind Sep 12 '16 at 17:44

score 0 · Answer 3 · answered Sep 12 '16 at 16:31

0

If I'm understanding correctly they are saying that in the context of a single-thread abstract machine the compiler may simply transform:

volatile Singleton* volatile temp = new Singleton;
pInstance = temp;

Into:

pInstance = new Singleton;

Because the observable behavior is unchanged. Then this brings us back to the original problem with double checked locking.

answered Sep 12 '16 at 16:31

Mark B

95,107
10
109
188

But I thought that using `volatile`, on the contrary, keeps compilers from making the optimizations you mentioned. According to the C++ 03 Standard 1.9.6: "The observable behavior of the abstract machine is its sequence of reads and writes to `volatile` data and calls to library I/O functions" and 1.9.7: "Accessing an object designated by a `volatile` lvalue, modifying an object... are all side effects... At certain specified points ... called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place" – undermind Sep 12 '16 at 17:36

Implementation of Double Checked Locking in C++ 98/03 using volatile

3 Answers3