Why is std::mutex taking a long, highly irregular amount of time to be shared?

Question

This code demonstrates that the mutex is being shared between two threads, but one thread has it nearly all of the time.

#include <thread>
#include <mutex>
#include <iostream>

#include <unistd.h>

int main ()
{
    std::mutex m;

    std::thread t ([&] ()
    {
        while (true)
        {
            {
                std::lock_guard <std::mutex> thread_lock (m);

                sleep (1); // or whatever
            }
            std::cerr << "#";
            std::cerr.flush ();
        }
    });

    while (true)
    {
        std::lock_guard <std::mutex> main_lock (m);
        std::cerr << ".";
        std::cerr.flush ();
    }
}

Compiled with g++ 7.3.0 on Ubuntu 18.04 4.15.0-23-generic.

The output is a mix of both # and . characters, showing that the mutex is being shared, but the pattern is surprising. Typically something like this:

.......#####..........................##################......................##

i.e. the thread_lock locks the mutex for a very long time. After several or even tens of seconds, the main_lock receives control (briefly) then the thread_lock gets it back and keeps it for ages. Calling std::this_thread::yield() doesn't change anything.

Why are the two mutexes not equally likely to gain the lock, and how can I make the mutex be shared in a balanced fashion?

Mutexes don't guarantee fairness. They just keep threads from interfering with each other. — Pete Becker, Oct 30 '18 at 16:16
The critical sections are different: In one case, the I/O is part of the critical section, in the other case it is not. — EOF, Oct 30 '18 at 16:17
@spraff -- because I/O typically involves the OS deciding to suspend a thread while some hardware operation takes place. When the thread is suspended it's not going to lock the mutex, and some other thread gets more of a chance to lock it. — Pete Becker, Oct 30 '18 at 16:46
It's your code that decides what to do while holding the mutex. If your code isn't doing the work you want it to do, that's on you. — David Schwartz, Oct 30 '18 at 17:40

score 1 · Answer 1 · answered Oct 30 '18 at 16:27

1

std::mutex isn't designed to be fair. It doesn't guarantee that the order of locking is kept, you're either lucky to get the lock or not.

If you want more fairness, consider using a std::condition_variable like so :

#include <thread>
#include <mutex>
#include <iostream>
#include <condition_variable>

#include <unistd.h>

int main ()
{
    std::mutex m;
    std::condition_variable cv;
    std::thread t ([&] ()
    {
        while (true)
        {
            std::unique_lock<std::mutex> lk(m);
            std::cerr << "#";
            std::cerr.flush ();
            cv.notify_one();
            cv.wait(lk);
        }
    });

    while (true)
    {
        std::unique_lock<std::mutex> lk(m);
        std::cerr << ".";
        std::cerr.flush ();
        cv.notify_one();
        cv.wait(lk);
    }
}

answered Oct 30 '18 at 16:27

Hatted Rooster

35,759
6
62
122

Thank you, but this just adds a new mystery. Since both loops lock the same mutex, why don't we have the same problem again? – spraff Oct 30 '18 at 16:34
@spraff Because now we are courteously waiting until the other thread notifies us that we're allowed to resume. (`wait` unlocks the mutex) The approach without `cv` just grabs the mutex if its free without asking or considering other threads. – Hatted Rooster Oct 30 '18 at 16:36
That still doesn't explain why this solves starvation. If the `std::thread` loop is lucky enough to get the lock nearly every time, can't its `wait` just consume the condition and greedily loop again whlie the main thread is still starved? Is there a guarantee or is the result just a lucky platform-dependent accident? – spraff Oct 30 '18 at 16:43
@spraff `notify_one` happens before the next wait of the same thread so the other thread will always be notified. – Hatted Rooster Oct 30 '18 at 17:00
2

This looks like the wrong way to use a condition variable to me. Remember that notifications can be lost. A condition variable must check an actual condition, aka, some variable must match the required condition. Otherwise if a notification arrives after the condition is awake but before it goes back to sleep, that notification is *gone*. – Zan Lynx Oct 30 '18 at 18:04
@ZanLynx That's another way to use a cv, and that's wait the overload of `wait` is for. Here we just use it to make sure one thread goes after the other and so forth. – Hatted Rooster Oct 30 '18 at 18:09
1

The above code has non-UB race conditions, in that the compiler is free to compile it to code that locks up and does nothing. You *must* deal with spurious wakeups in a cv, and if you ignore the return value of the non-lambda `.wait` your code is broken. – Yakk - Adam Nevraumont Oct 30 '18 at 19:25
1

@ZanLynx It cannot be lost in this case because `notify` is used from within critical section. – Oct 30 '18 at 19:35
@Yakk-AdamNevraumont AFAIK `.wait` doesnt return anything though. So you'd have to make sure something is checked. The only case where `wait` would continue execution is after a `notify_one` which the code covers or after a spurious wake-up. You're indeed correct in that. How does that, in this piece of code, affect things though? If the `wait` wakes up it'll just go to the next iteration and wait for the lock or grab it and notify the other thread. Don't get me wrong, I'm sure you know better than me regarding the standard here, im just curious how the compiler is allowed to. – Hatted Rooster Oct 30 '18 at 19:38
@SombreroChicken Sorry, I was thinking interval wait, which have a return value. You are right; your code is equivalent to not using a cv at all, as if all `.lock`s instantly spurious wakeup the notify stuff does nothing. Still code smell; the cv is a fancy "sleep" or "yield" type instruction that indicates you want the other thread to do something, but doesn't actually ensure it. – Yakk - Adam Nevraumont Oct 30 '18 at 19:41
@Yakk-AdamNevraumont Yeah exactly, and in the cases that it doesn't wakeup, you "kind of" have a fair distribution. Althought your solution is much more robust and elegant. – Hatted Rooster Oct 30 '18 at 19:42
This code doesn't allow either thread to make forward progress while the other one isn't running and doesn't guarantee strict alternation. I don't like showing people example code that's this fragile without clear warnings as people may model their own code on it thinking that it's a good example of sound coding practices. Using `wait` without any predicate is almost always wrong and this manages to find the one weird edge case where it isn't. (This code isn't guarantee to work, by the way. Consider a platform that always has one spurious wakeup when a thread first waits.) – David Schwartz Oct 30 '18 at 19:49
In fact, I downvoted. This code purports to fix code that relies on mutex fairness, which is not guaranteed. But it only works if there aren't lots of spurious wakeups, which also is not guaranteed. So it's code that happens to work on some platforms as a suggested improvement to code that happens not to work on some platforms. That doesn't strike me as a good way to teach proper coding practices. – David Schwartz Oct 30 '18 at 19:51
@DavidSchwartz I doubt OP is going to run this on TempleOS but I understand your reasoning, to each their own, friend. – Hatted Rooster Oct 30 '18 at 19:57
I appreciate the in-depth discussion going on, but I'm having trouble following. Have we arrived at consensus? – spraff Oct 30 '18 at 22:41

Yakk - Adam Nevraumont · Answer 2 · 2018-10-30T19:43:54.873

Making std::mutex fair would have a cost. And in C++ you don't pay for what you don't ask for.

You could write a locking object where the party releasing the lock cannot be the next one to get it. More advanced, you could write one where this only occurs if someone else is waiting.

Here is a quick, untested stab at a fair mutex:

struct fair_mutex {
  void lock() {
    auto l = internal_lock();
    lock(l);
  }
  void unlock() {
    auto l = internal_lock();
    in_use = false;
    if (waiting != 0) {
      loser=std::this_thread::get_id();
    } else {
      loser = {};
    }
    cv.notify_one();
  }
  bool try_lock() {
    auto l = internal_lock();
    if (in_use) return false;
    lock(l);
    return true;
  }
private:
  void lock(std::unique_lock<std::mutex>&l) {
    ++waiting;
    cv.wait( l, [&]{ return !in_use && std::this_thread::get_id() != loser; } );
    in_use = true;
    --waiting;
  }
  std::unique_lock<std::mutex> internal_lock() const {
    return std::unique_lock<std::mutex>(m);
  }
  mutable std::mutex m;
  std::condition_variable cv;
  std::thread::id loser;
  bool in_use = false;
  std::size_t waiting = 0;
};

it is "fair" in that if you have two threads contending over a resource, they will take turns. If someone is waiting on a lock, anyone giving up the lock won't grab it again.

This is, however, threading code. So I might read it over, but I wouldn't trust my first attempt to write anything.

You could extend this (at increasing cost) to be n-way fair (or even omega-fair) where if there are up to N elements waiting, they all get their turn before the releasing thread gets another chance.

Why is std::mutex taking a long, highly irregular amount of time to be shared?

2 Answers2

Linked