Why it seems that mutex acquired via std::lock_guard still take effect for a little while after its scope

Question

As we know, the correct usage of std::lock_guard is like this RAII style:

void increase_decrease() {
    std::lock_guard<std::mutex> guard(global_mutex);

    static const int times = 50;
    for (int i = 0; i < times; i++) {
        global_data ++;
    }
    for (int i = 0; i < times; i++) {
        global_data --;
    }
}

Here, my point is not about how to use std::lock_guard or mutex.

In the code below, we deliberately use std::lock_guard in a wrong way. (That is to put it into a block before the critical section.)

16 threads are created to add 1 to and subtract 1 from a global int variable, which is initialized as 0, for 50 times.

std::lock_guard is called in a block, and the block is before the critical section (WRONG WAY! Never do something like this!). Mutex will be released after the block(wrong usage, again), following RAII-style mechanism. So, when it goes into the critical section, no lock is there.

#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>
#include <vector>

int global_data = 0;
std::mutex global_mutex;
void increase_decrease() {
    // XXX: INCORRECT USAGE! INCORRECT USAGE! INCORRECT USAGE!
    {
        std::lock_guard<std::mutex> guard(global_mutex);
    }
    // // XXX: uncomment to sleep for a litter while
    // std::this_thread::sleep_for(std::chrono::milliseconds(10));
    static const int times = 50;
    for (int i = 0; i < times; i++) {
        global_data ++;
    }
    for (int i = 0; i < times; i++) {
        global_data --;
    }
}

void try_mutex() {
    const int num_workers = 16;
    std::vector<std::thread> workers;

    auto start = std::chrono::system_clock::now();
    for (int i = 0; i < num_workers; i++) {
        std::thread t(increase_decrease);
        workers.push_back(std::move(t));
    }
    for (auto &t: workers) {
        t.join();
    }
    auto end = std::chrono::system_clock::now();
    std::chrono::duration<double> elapsed_seconds = end-start;
    std::cout << "global_data: " << global_data
        << ", elapsed second: " << elapsed_seconds.count();
}

int main() {
    try_mutex();
}

I found that whether sleeping for 10ms causes different results.

Without sleep, stdout of 20 calls of main is:

global_data: 0, elapsed second: 0.000363
global_data: 0, elapsed second: 0.000359
global_data: 0, elapsed second: 0.000349
global_data: 0, elapsed second: 0.000345
global_data: 0, elapsed second: 0.000352
global_data: 0, elapsed second: 0.000323
global_data: 0, elapsed second: 0.000619
global_data: 0, elapsed second: 0.000431
global_data: 34, elapsed second: 0.000405
global_data: -14, elapsed second: 0.000415
global_data: 0, elapsed second: 0.000497
global_data: 0, elapsed second: 0.000366
global_data: 0, elapsed second: 0.000413
global_data: 0, elapsed second: 0.000406
global_data: 0, elapsed second: 0.000353
global_data: 0, elapsed second: 0.000363
global_data: 0, elapsed second: 0.000361
global_data: 0, elapsed second: 0.000358
global_data: 0, elapsed second: 0.000348
global_data: 0, elapsed second: 0.000367

However, if we uncomment the sleep, stdout of 20 calls of main is:

global_data: 44, elapsed second: 0.011108
global_data: 15, elapsed second: 0.010645
global_data: 25, elapsed second: 0.012905
global_data: 27, elapsed second: 0.012914
global_data: 9, elapsed second: 0.012871
global_data: 46, elapsed second: 0.012836
global_data: 44, elapsed second: 0.011307
global_data: -2, elapsed second: 0.01286
global_data: 77, elapsed second: 0.012853
global_data: 43, elapsed second: 0.011984
global_data: 0, elapsed second: 0.011134
global_data: -3, elapsed second: 0.011571
global_data: 49, elapsed second: 0.012438
global_data: 43, elapsed second: 0.011552
global_data: -20, elapsed second: 0.010807
global_data: 0, elapsed second: 0.010514
global_data: 0, elapsed second: 0.010916
global_data: -44, elapsed second: 0.012829
global_data: 50, elapsed second: 0.011759
global_data: 9, elapsed second: 0.012873

The probability that global_data equals to 0 is much larger in the first case than in the second one. I tried many times. It's not just a coincidence.

So, it seems that there is a chance for mutex to take effect for a little while after the block where it got acquired via std::lock_guard. Why?

Thank you.

That's not what's happening, When that scope is left, the mutex is unlocked on that thread; period. Ignoring the massive race condition your coded yourself, without the sleep at all, any/all of those threads can easily finish those trivial short loops before the next thread is even scheduled. The sleep just makes them stomp on each other more likely. Jam that counter to something meaningful, like a half-million or so. Regardless, in reality-based code you would move the scope lock within each for-loop body, or better still, get rid of it entirely and use a `std::atomic_int` for `global_data`. — WhozCraig, Jun 08 '19 at 11:42
It depends on how thread::sleep_for() is implemented. But you have good evidence that it falls back to the OS and integrates well with its thread scheduler. All of these threads complete their sleep at the exact same time so the odds for the threading race bug to have its undesirable side-affects are much larger. Pretty normal. — Hans Passant, Jun 08 '19 at 12:09

LWimsey · Answer 1 · 2019-06-08T12:41:43.650

4

std::lock_guard is called in a block which is before the critical section. Mutex should be released after the block, following RAII-style mechanism.

No, the mutex is released at the end of the block you put it in. And that is before accessing global_data.. Hence it is not protected, and that is undefined behavior. The results you see are all possible with undefined behavior and you should not spent too much effort trying to understand those outcomes.

If you remove the braces around lock_guard, it will work just fine.

edited Jun 08 '19 at 12:41

answered Jun 08 '19 at 11:26

LWimsey

6,189
2
25
53

Sorry for my unclear expression. I mean that `std::lock_guard` is called in a block, and the block ends before the critical section (incorrect usage, of course). My point is not about how to use std::lock_guard, but about the inconsistent results with and without sleep. Thank you. – Yunqing Gong Jun 08 '19 at 12:29
I understand, but only after you modified the question :) .. I will add a few lines – LWimsey Jun 08 '19 at 12:40

Paul Sanders · Answer 2 · 2019-06-08T11:31:51.573

3

You have a data race on variable global_data (because you release the lock before you start manipulating it). Because of this, the results of running your code are unpredictable.

If you change global_data from int to std::atomic <int> then you get the same output both with and without the sleep (all zeroes).

Live demo

edited Jun 08 '19 at 11:31

answered Jun 08 '19 at 11:24

Paul Sanders

24,133
4
26
48

Why it seems that mutex acquired via std::lock_guard still take effect for a little while after its scope

2 Answers2