4

Anything wrong with the following Singleton implementation?

Foo& Instance() {
    if (foo) {
        return *foo;
    }
    else {
        scoped_lock lock(mutex);

        if (foo) {
            return *foo;
        }
        else {
            // Don't do foo = new Foo;
            // because that line *may* be a 2-step 
            // process comprising (not necessarily in order)
            // 1) allocating memory, and 
            // 2) actually constructing foo at that mem location.
            // If 1) happens before 2) and another thread
            // checks the foo pointer just before 2) happens, that 
            // thread will see that foo is non-null, and may assume 
            // that it is already pointing to a a valid object.
            //
            // So, to fix the above problem, what about doing the following?

            Foo* p = new Foo;
            foo = p; // Assuming no compiler optimisation, can pointer 
                     // assignment be safely assumed to be atomic? 
                     // If so, on compilers that you know of, are there ways to 
                     // suppress optimisation for this line so that the compiler
                     // doesn't optimise it back to foo = new Foo;?
        }
    }
    return *foo;
}
GingerPlusPlus
  • 5,336
  • 1
  • 29
  • 52
moog
  • 81
  • 1
  • 5
  • Read this if you haven't already: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html – Alex B Aug 17 '10 at 03:34
  • You need Foo * volatile p = new Foo. You may also need foo to be volatile but I am not sure about that. – Tomek Aug 17 '10 at 06:29
  • @Tomek "volatile" is not a barrier, it wouldn't work. Don't use "volatile" for multi-threading (except the stop flag to check if a loop needs to stop). – Alex B Aug 17 '10 at 13:01
  • @Alex B: I now that. But if you want to finish writing to p before reading it you still need it. It is just a part of the answer (which I probably should have said more clearly). – Tomek Aug 17 '10 at 13:48

7 Answers7

4

No, you cannot even assume that foo = p; is atomic. It's possible that it might load 16 bits of a 32-bit pointer, then be swapped out before loading the rest.

If another thread sneaks in at that point and calls Instance(), you're toasted because your foo pointer is invalid.

For true security, you will have to protect the entire test-and-set mechanism, even though that means using mutexes even after the pointer is built. In other words (and I'm assuming that scoped_lock() will release the lock when it goes out of scope here (I have little experience with Boost)), something like:

Foo& Instance() {
    scoped_lock lock(mutex);
    if (foo != 0)
        foo = new Foo();
    return *foo;
}

If you don't want a mutex (for performance reasons, presumably), an option I've used in the past is to build all singletons before threading starts.

In other words, assuming you have that control (you may not), simply create an instance of each singleton in main before kicking off the other threads. Then don't use a mutex at all. You won't have threading problems at that point and you can just use the canonical don't-care-about-threads-at-all version:

Foo& Instance() {
    if (foo != 0)
        foo = new Foo();
    return *foo;
}

And, yes, this does make your code more dangerous to people who couldn't be bothered to read your API docs but (IMNSHO) they deserve everything they get :-)

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 1
    Scary. Do you know on what machine architectures that can happen? – moog Aug 17 '10 at 03:09
  • No, but I don't know any architectures that provide a 128-bit char either, yet the standard says that's possible :-) – paxdiablo Aug 17 '10 at 03:10
  • @paxdiablo: According to Wikipedia, `sizeof(char)` is defined to be 1. – Borealid Aug 17 '10 at 03:15
  • Yes, sizeof(char) is 1 byte but that does _not_ necessarily mean 8 bits. ISO uses 'octet' for an 8 bit value. The size of your char (and your byte) is defined in limits.h (for C, no idea of the equivalent header for C++) as CHAR_BITS (from memory). – paxdiablo Aug 17 '10 at 03:17
  • @paxdiablo: I surrender. On another note, I now consider the C++ standard to be retarded. – Borealid Aug 17 '10 at 03:22
  • @Borealid: There are real processors (especially DSPs) that don't support 8-bit data types, so if you required `char` to be 8-bits, you'd make C either impossible or horribly inefficient on those processors. – Jerry Coffin Aug 17 '10 at 04:34
  • @Jerry funny that C99 makes C *less* portable to such platforms (`uint8_t`). – Alex B Aug 17 '10 at 07:07
  • 1
    @Alex B: not really (§7.18.1.1/3): "These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, it shall define the corresponding typedef names." What isn't portable is the (huge amount of) code that assumes these types will always be defined. – Jerry Coffin Aug 17 '10 at 11:05
3

Why not keep it simple?

Foo& Instance()
{
    scoped_lock lock(mutex);

    static Foo instance;
    return instance;
}

Edit: In C++11 where threads is introduced into the language. The following is thread safe. The language guarantees that instance is only initialized once and in a thread safe manor.

Foo& Instance()
{
    static Foo instance;
    return instance;
}

So its lazily evaluated. Its thread safe. Its very simple. Win/Win/Win.

Martin York
  • 257,169
  • 86
  • 333
  • 562
1

This depends on what threading library you're using. If you're using C++0x you can use atomic compare-and-swap operations and write barriers to guarantee that double-checked locking works. If you're working with POSIX threads or Windows threads, you can probably find a way to do it. The bigger question is why? Singletons, it turns out, are usually unnecessary.

Max Lybbert
  • 19,717
  • 4
  • 46
  • 69
  • Logging class :( That seems to *really* call for Singletons. I'v been trying to stay away from the S word, but it seems logging is one area where it's useful. I would very, very much love to be proven wrong! – moog Aug 17 '10 at 03:47
  • Funny thing. In a program I wrote at work, I did all my logging through a global variable, as that seemed like a legitimate use for a Singleton. However, since the logging was to the Windows Event Log, it turned out that there was no penalty to creating local objects that wrote to the Event Log, and Windows would synchronize everything for me. – Max Lybbert Aug 17 '10 at 08:05
  • 1
    Just to add - I am quite disappointed to see that not all the c++0x threading goodies are available on the compiler that comes with VS2010. – moog Aug 17 '10 at 10:46
  • @moog: Logging classes *don't* need to be singletons. They can be plain globals (allowing you to have multiple loggers, which can actually be useful or even necessary, and allowing you to initialize and destroy a logger freely) – jalf Aug 21 '10 at 13:34
0

Why don't you just use a real mutex ensuring that only one thread will attempt to create foo?

Foo& Instance() {
    if (!foo) {
        pthread_mutex_lock(&lock);
        if (!foo) {
            Foo *p = new Foo;
            foo = p;
        }
        pthread_mutex_unlock(&lock);
    }
    return *foo;
}

This is a test-and-test-and-set lock with free readers. Replace the above with a reader-writer lock if you want reads to be guaranteed safe in a non-atomic-replacement environment.

edit: if you really want free readers, you can write foo first, and then write a flag variable fooCreated = 1. Checking fooCreated != 0 is safe; if fooCreated != 0, then foo is initialized.

Foo& Instance() {
    if (!fooCreated) {
        pthread_mutex_lock(&lock);
        if (!fooCreated) {
            foo = new Foo;
            fooCreated = 1;
        }
        pthread_mutex_unlock(&lock);
    }
    return *foo;
}
Borealid
  • 95,191
  • 9
  • 106
  • 122
  • @paxdiablo: I think I said that in my answer. But I've added a simple method with a lock-free fast path for non-atomic-replacement systems. – Borealid Aug 17 '10 at 03:12
  • That's better. I don't think C++ can allow out-of-order execution of `foo=/fooCreated=` due to sequence points, so it should be okay. – paxdiablo Aug 17 '10 at 03:15
  • Thank you Borealid and paxdiablo! Indeed, the fooCreated method looks safe. Disturbingly safe. :) I am now thinking of scenarios in which it can fail! I'm just a paranoid peanut! :) – moog Aug 17 '10 at 03:22
  • 1
    @paxdiablo, why should sequence points matter? Compiler can do whatever it pleases as long as the observed behaviour is the same within a single thread. There is nothing that stops it from assigning `fooCreated` before `foo` (you may find foo is assigned somewhere midway through inlined Foo constructor). – Alex B Aug 17 '10 at 03:24
  • @Alex B: What if we move fooCreated = 1 out of the inner if block, to just before the unlock line? – moog Aug 17 '10 at 03:29
  • 2
    @moog Actually I've had *similar* code fail miserably in a multi-threaded environment. You'd be surprised what the compiler does to maximize the use of CPU pipeline. I've learned to never assume thread-safety in a C++ program unless platform/compiler-specific locks or atomics are used. – Alex B Aug 17 '10 at 03:29
  • 2
    @moog, wouldn't matter. I've read this and it's eye-opening: http://www.drdobbs.com/cpp/210600279 and http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html – Alex B Aug 17 '10 at 03:35
  • @Alex B: Got your point - thanks for the warnings! I'll also go check out the 2 links. Regarding my earlier comment about moving fooCreated = 1 out of the inner if block - even that might be futile if the compiler optimises in a way that changes the code execution order. And that doesn't seem to be such a far-fetched idea for the code in question, I think. – moog Aug 17 '10 at 03:36
  • 2
    Exceptions may cause the lock to stay lock. Used RAII to get around this problem. – Martin York Aug 17 '10 at 03:44
  • Will this not even work if you mark the variables volatile? Surely C++ won't re-order execution then? – paxdiablo Aug 17 '10 at 04:02
  • 1
    @paxdiablo Nope, compiler can't reorder volatile read/writes, but can reorder volatile and non-volatile ones. "volatile" as a barrier is an MS extension. In fact, the code I've been bitten by attempted to use "volatile" to (unsuccessfully) avoid reordering http://stackoverflow.com/questions/2535148/volatile-qualifier-and-compiler-reorderings – Alex B Aug 17 '10 at 04:23
  • With reference to my original code posting, how about sandwiching the "foo = p;" line between _WriteBarrier() and _ReadBarrier() calls? – moog Aug 17 '10 at 10:43
  • @moog, then you need the barriers around check if foo is null – Alex B Aug 25 '10 at 01:46
0

the new operator in c++ always invovle 2-steps process :
1.) allocating memory identical to simple malloc
2.) invoke constructor for given data type

Foo* p = new Foo;
foo = p;

the code above will make the singleton creation into 3 step, which is even vulnerable to problem you trying to solve.

YeenFei
  • 3,180
  • 18
  • 26
0

Thanks for all your input. After consulting Joe Duffy's excellent book, "Concurrent Programming on Windows", I am now thinking that I should be using the code below. It's largely the code from his book, except for some renames and the InterlockedXXX line. The following implementation uses:

  1. volatile keyword on both the temp and "actual" pointers to protect against re-ordering from the compiler.
  2. InterlockedCompareExchangePointer to protect against reordering from the CPU.

So, that should be pretty safe (... right?):

template <typename T>
class LazyInit {
public:
    typedef T* (*Factory)();
    LazyInit(Factory f = 0) 
        : factory_(f)
        , singleton_(0)
    {
        ::InitializeCriticalSection(&cs_);
    }

    T& get() {
        if (!singleton_) {
            ::EnterCriticalSection(&cs_);
            if (!singleton_) {
                T* volatile p = factory_();
                // Joe uses _WriterBarrier(); then singleton_ = p;
                // But I thought better to make singleton_ = p atomic (as I understand, 
                // on Windows, pointer assignments are atomic ONLY if they are aligned)
                // In addition, the MSDN docs say that InterlockedCompareExchangePointer
                // sets up a full memory barrier.
                ::InterlockedCompareExchangePointer((PVOID volatile*)&singleton_, p, 0);
            }
            ::LeaveCriticalSection(&cs_);
        }
        #if SUPPORT_IA64
        _ReadBarrier();
        #endif
        return *singleton_;
    }

    virtual ~LazyInit() {
        ::DeleteCriticalSection(&cs_);
    }
private:
    CRITICAL_SECTION cs_;
    Factory factory_;
    T* volatile singleton_;
};
moog
  • 383
  • 5
  • 13
  • don't ask questions in the answer section. Either edit the question, or post a new one, as appropriate – jalf Aug 21 '10 at 13:33
0

It has nothing wrong with your code. After the scoped_lock, there will be only one thread in that section, so the first thread that enters will initialize foo and return, and then second thread(if any) enters, it will return immediately because foo is not null anymore.

EDIT: Pasted the simplified code.

Foo& Instance() {
  if (!foo) {
    scoped_lock lock(mutex);
    // only one thread can enter here
    if (!foo)
        foo = new Foo;
  }
  return *foo;
}
leiz
  • 3,984
  • 2
  • 23
  • 17
  • If a second thread is doing the first test (`if (!foo)`) while the first thread is doing the assignment (`foo = new Foo`), then this can fail unless the assignment is guaranteed to be atomic. – Adrian McCarthy Mar 16 '11 at 17:58
  • that is exactly why there is another test after lock is acquired. – leiz Mar 18 '11 at 08:02