Reading this article about Double Checked Locking Pattern in C++, I reached the place (page 10) where the authors demonstrate one of the attempts to implement DCLP "correctly" using volatile
variables:
class Singleton {
public:
static volatile Singleton* volatile instance();
private:
static volatile Singleton* volatile pInstance;
};
// from the implementation file
volatile Singleton* volatile Singleton::pInstance = 0;
volatile Singleton* volatile Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
volatile Singleton* volatile temp = new Singleton;
pInstance = temp;
}
}
return pInstance;
}
After such example there is a text snippet that I don't understand:
First, the Standard’s constraints on observable behavior are only for an abstract machine defined by the Standard, and that abstract machine has no notion of multiple threads of execution. As a result, though the Standard prevents compilers from reordering reads and writes to volatile data within a thread, it imposes no constraints at all on such reorderings across threads. At least that’s how most compiler implementers interpret things. As a result, in practice, many compilers may generate thread-unsafe code from the source above.
and later:
... C++’s abstract machine is single-threaded, and C++ compilers may choose to generate thread-unsafe code from source like the above, anyway.
These remarks are related to the execution on the uni-processor, so it's definitely not about cache-coherence issues.
If the compiler can't reorder reads and writes to volatile data within a thread, how can it reorder reads and writes across threads for this particular example thus generating thread-unsafe code?