Deadlock simulation using std::mutex

Question

I have following example:

template <typename T>
class container
{
public:
    std::mutex _lock;
    std::set<T> _elements;

    void add(T element)
    {
        _elements.insert(element);
    }

    void remove(T element)
    {
        _elements.erase(element);
    }
};

void exchange(container<int>& cont1, container<int>& cont2, int value)
    {
        cont1._lock.lock();
        std::this_thread::sleep_for(std::chrono::seconds(1));

        cont2._lock.lock();

        cont1.remove(value);
        cont2.add(value);

        cont1._lock.unlock();
        cont2._lock.unlock();
    }

    int main() 
    {
        container<int> cont1, cont2;

        cont1.add(1);
        cont2.add(2);

        std::thread t1(exchange, std::ref(cont1), std::ref(cont2), 1);
        std::thread t2(exchange, std::ref(cont2), std::ref(cont1), 2);

        t1.join();
        t2.join();

        return 0;
    }

In this case I'm expiriencing a deadlock. But when I use std::lock_guard instead of manually locking and unlocking mutextes I have no deadlock. Why?

void exchange(container<int>& cont1, container<int>& cont2, int value)
{
    std::lock_guard<std::mutex>(cont1._lock);
    std::this_thread::sleep_for(std::chrono::seconds(1));

    std::lock_guard<std::mutex>(cont2._lock);

    cont1.remove(value);
    cont2.add(value);
}

This does not answer your question directly, but you can avoid the deadlock here by using [`std::lock`](http://en.cppreference.com/w/cpp/thread/lock) to lock both mutexes at once. You can (and should) still transfer ownership of the lock to a `lock_guard` after that using `std::adopt_lock`. — ComicSansMS, Apr 21 '17 at 09:28
@Amadeusz. Why have you switched the arguments around? This is your reason for the deadlock (or are you simulating it deliberately). — Werner Erasmus, Apr 21 '17 at 09:44

score 7 · Accepted Answer · edited Apr 21 '17 at 16:23

Your two code snippets are not comparable. The second snippet locks and immediately unlocks each mutex as the temporary lock_guard object is destroyed at the semicolon:

std::lock_guard<std::mutex>(cont1._lock);  // temporary object

The correct way to use lock guards is to make scoped variables of them:

{
    std::lock_guard<std::mutex> lock(my_mutex);

    // critical section here

}   // end of critical section, "lock" is destroyed, calling mutex.unlock()

(Note that there is another common error that's similar but different:

std::mutex mu;
// ...
std::lock_guard(mu);

This declares a variable named mu (just like int(n);). However, this code is ill-formed because std::lock_guard does not have a default constructor. But it would compile with, say, std::unique_lock, and it also would not end up locking anything.)

Now to address the real problem: How do you lock multiple mutexes at once in consistent order? It may not be feasible to agree on a single lock order across an entire codebase, or even across a future user's codebase, or even in local cases as your example shows. In such cases, use the std::lock algorithm:

std::mutex mu1;
std::mutex mu2;

void f()
{
    std::lock(mu1, mu2);

    // order below does not matter
    std::lock_guard<std::mutex> lock1(mu1, std::adopt_lock);        
    std::lock_guard<std::mutex> lock2(mu2, std::adopt_lock);
}

In C++17 there is a new variadic lock guard template called scoped_lock:

void f_17()
{
    std::scoped_lock lock(mu1, mu2);

    // ...
}

The constructor of scoped_lock uses the same algorithm as std::lock, so the two can be used compatibly.

Hmmm, I missed that (didn't read, and just assumed he was using lock_guard correctly). — Werner Erasmus, Apr 21 '17 at 09:24
Could I suggest a small change from "It is not feasible" to "If it is not feasible"? The important issue is about order of nesting not order of locking and it's quite possible to implement multiple co-operating libraries that lock things on entry and unlock them on exit and never conflict. The classic defeat to that strategy is polymorphism. As soon as polymorphic methods start explicitly or (worse!) implicitly acquiring locks while their callers are doing the same the goose is likely cooked on a prescriptive locking order. — Persixty, Apr 21 '17 at 09:53
@DanAllen: Hm, I already say "across an entire codebase"; it is certainly feasible to agree on an order in specific situations. And the OP's example shows that order is just as important as nesting. Maybe it'd be better to mention nesting specifically as another source of problems? — Kerrek SB, Apr 21 '17 at 10:12
The only deadlock problem is nesting. By nesting I mean holding one (or more) locks and prospecting others. I don't disagree with your point except it implies all large code bases need to resort to try-retreat anti-deadlock strategies I just think that is too negative. My point is that so long as individual sub-systems keep themselves to themselves there doesn't need to be a problem. It's when (e.g. polymorphism) that keeping self-to-self breaks down you get trouble... — Persixty, Apr 21 '17 at 10:40

Persixty · Answer 2 · 2017-04-21T14:20:26.367

1

While Kerrek SB's answer is entirely valid I thought I'd throw an alternative hat in the ring. std::lock or any try-and-retreat deadlock avoidance strategies should be seen as the last resort from a performance perspective.

How about:

#include <functional> //includes std::less<T> template.

static const std::less<void*> l;//comparison object. See note.

void exchange(container<int>& cont1, container<int>& cont2, int value)
    {
        if(&cont1==&cont2) {
            return; //aliasing protection.
        }
        std::unique_lock<std::mutex> lock1(cont1._lock, std::defer_lock);
        std::unique_lock<std::mutex> lock2(cont2._lock, std::defer_lock);
        if(l(&cont1,&cont2)){//in effect portal &cont1<&cont2
            lock1.lock();
            std::this_thread::sleep_for(std::chrono::seconds(1));
            lock2.lock();
        }else{
            lock2.lock();
            std::this_thread::sleep_for(std::chrono::seconds(1));
            lock1.lock();
        } 
        cont1.remove(value);
        cont2.add(value);
    }

This code uses the memory address of the objects to determine an arbitrary but consistent lock order. This approach can (of course) be generalized.

Note also that in reusable code the aliasing protection is necessary because the version where cont1 is cont2 would be invalid by trying to lock the same lock twice. std::mutex cannot be assumed to be a recursive lock and normally isn't.

NB: The use of std::less<void> ensures compliance as it guarantees a consistent total ordering of addresses. Technically (&cont1<&cont2) is unspecified behavior. Thanks Kerrek SB!

edited Apr 21 '17 at 14:20

answered Apr 21 '17 at 10:09

Persixty

8,165
2
13
35

1

`&cont1 < &cont2` has undefined behaviour, unfortunately :-S – Kerrek SB Apr 21 '17 at 10:13
Does your cast compile? – Kerrek SB Apr 21 '17 at 10:55
@KerrekSB Added some notes about portability. I think it would be a shame to rule out a very efficient strategy that will work on almost all platforms for a technically correct but obscure concern about the bit pattern of pointers. I can only think of this failing on some strange paged memory platform. Do you know any physical platform it wouldn't work for? – Persixty Apr 21 '17 at 10:56
I certainly know that compilers will aggressively assume that you don't have UB and consider code paths with UB unreachable, which may have surprising effects :-) I can't speak to this particular construction of course. I could however recommend the use of `std::less`, which is guaranteed to provide a total ordering. – Kerrek SB Apr 21 '17 at 10:57
@KerrekSB. I wasn't aware of that stronger guarantee on `std::less` (which is what is recommended over `std::less` for whatever reason...). That's what is needed here. Fixed! – Persixty Apr 21 '17 at 12:14
1

Here's a very detailed paper that measures the performance of various multi-lock locking strategies, including the "ordering" algorithm suggested here: http://howardhinnant.github.io/dining_philosophers.html Under high contention scenarios, "ordering" is _not_ the high performance solution. – Howard Hinnant Apr 21 '17 at 13:48
Well, `less` is available in C++11, `less<>` only in C++14 (and only in sufficiently recent versions of C++14 that implement the relevant DRs). – Kerrek SB Apr 21 '17 at 14:14
@HowardHinnant Actually looking at that second graph it's medium sized contention that ordered is losing out and extrapolation suggests it scales (at infinity) well. However it's all depends on the circumstances. Dining Philiosphers is a good model for teaching deadlock but often a poor model for exploring performance. The idea that the number of locks is proportional to threads and each thread only every tries the same two locks is very unlike most systems. There are often many more locks than threads but (potentially) a handful of hotly contended locks. – Persixty Apr 21 '17 at 14:19
@kerrekSB at that I'll go with . – Persixty Apr 21 '17 at 14:21
English arguments do not make ordering the fastest algorithm. Measure. – Howard Hinnant Apr 21 '17 at 14:22
@HowardHinnant Measuring unrealistic things tells us nothing. Logical arguments tell us a lot. I guarantee that in low contention but where the number of locks required is large ordering will fail because it has an implicit sort whereas all the others will sail through. I depends what problem you're solving. – Persixty Apr 21 '17 at 14:24
1

@HowardHinnant Try this http://ideone.com/JBqhIK. #define MODEL 1 or 2 at the top. It's a toy and another problem and platform specific case but the ordered lock model is significantly faster than `std::lock`. If you trim it back to 2 threads (as the OP) the effect is still there but lessened. This is the poor philosophers problem! More philosophers than cutlery! PS: Just trying to point out one model and set of measurements shouldn't be over extrapolated. – Persixty Apr 21 '17 at 17:15
1

Nice measurement! I'm getting ordered to be about 20% faster than `std::lock` with this test. I slightly altered the test to have 4 containers and set `t1{c1, c2}`, `t2{c2, c3}`, `t3{c3, c4}`, `t4{c4, c1}`. The results changed to `std::lock` 330% faster than ordered. This might be summarized by "ordered can be vulnerable to cyclic dependencies among the mutexes." I also noted that very little parallel execution happens in the test as you've set it up. Using either strategy consumes about 1 CPU. In my altered test, and with `std::lock`, about 3.5 CPUs are utilized, but only 1 CPU with ordered. – Howard Hinnant Apr 21 '17 at 17:52
@HowardHinnant May be a feature of the Dining Philosophers. But as I said I think it's a poor model of most applications. And yes very little parallelism in my test. 2 locks in the whole system and all threads need both to progress! Technically zero parallelism just a massive contention bottleneck. – Persixty Apr 21 '17 at 18:09
Dining Philosophers may be a good model of a financial application that assigns a mutex to each account, and locks two accounts to implement a payment from one account to another. In this scenario the mutex dependency is a run-time structure, not a compile-time structure, but could easily create a cyclic structure at run-time: A pays B which pays C which pays D ... which pays A. – Howard Hinnant Apr 21 '17 at 18:30
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/142328/discussion-between-dan-allen-and-howard-hinnant). – Persixty Apr 22 '17 at 08:35

Deadlock simulation using std::mutex

2 Answers2