0

I have written this piece of code as test:

#include <iostream>
#include <thread>
#include <mutex>

int counter = 0;

auto inc(int a) {
    for (int k = 0; k < a; ++k)     
        ++counter;
}

int main() {

    auto a = std::thread{ inc, 100000 };
    auto b = std::thread{ inc, 100000 };

    a.join();
    b.join();

    std::cout << counter;
    return 0;
}

The counter variable is global and so, creating 2 threads a and b, I'd expect to find a data race. The output is 200000 and not a random number. Why?

This code is a fixed version that uses a mutex so that the global variable can be accessed only once (1 thread per time). The result is still 200000 .

std::mutex mutex;

auto inc(int a) {
    mutex.lock();
    for (int k = 0; k < a; ++k)     
        ++counter;
    mutex.unlock(); 
}

The fact is this. The mutex-solution gives me 200000 which is correct because only 1 threat at time can access counter. But why the non-mutex-solution still shows 200000?

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
Emma Rossignoli
  • 935
  • 7
  • 25
  • 4
    The behavior is undefined, it can do anything, including giving the result you expected. It may not on another platform, on another platform or at a later date. Perhaps the increment operator is atomic on your platform? – François Andrieux Jul 16 '18 at 20:30
  • 1
    For this application, you could just use `std::atomic counter;`. It seems your platform has atomic `int` increment. `std::atomic` can portably take advantage of that. – François Andrieux Jul 16 '18 at 20:32
  • @FrançoisAndrieux I use VS2017 and an Intel i7700k. My platform doesnt have the increment operation atomic! I have written the same code with Delphi and the non-mutex gives me random values – Emma Rossignoli Jul 16 '18 at 20:32
  • If you insist on speculating about undefined behavior you need to share the exact compiler version and compilation flags used. – François Andrieux Jul 16 '18 at 20:35
  • Even aside from the undefined behavior question, even at `-O1`, both gcc and clang implement your function as a single `add` instruction... so it's unlikely you'd see those clash... – Barry Jul 16 '18 at 20:37
  • With VS2015, I get 200000 every time in Release and other values in Debug. The whole thing might be getting optimized out. And there's no telling what other tricks the compiler might perform if it see that UB. – François Andrieux Jul 16 '18 at 20:38
  • @FrançoisAndrieux Just using `std::atomic counter;` is likely to be slow if you have multiple threads writing to it back and forth like that. I'd recommend having a local `int localCounter = 0;` for the loop, then at the end of it all, write `counter += localCounter`, though of course with this trivial example, it doesn't really matter – Justin Jul 16 '18 at 20:38

3 Answers3

6

The problem here is that your data race is extremely small. Any modern compiler will convert your inc function to counter += a, so the race window is extremely small - I'd even say that most probably once you start the second thread the first one has already finished.

This doesn't make this any less undefined behavior, but explains the result you are seeing. You may make the compiler less smart about your loop e.g. by making a or k or counter volatile; then your data race should become evident.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
3

Data races are undefined behavior, which means that any program execution is valid, including program execution that happens to do what you want. In this case, the compiler is probably optimizing your loop into counter += a and the first thread finishes before the second thread starts so they never actually conflict.

Marc Aldorasi
  • 698
  • 7
  • 13
2

Race Conditions are Undefined Behavior

You can't make assertions about what should happen when a data race is involved. Your assertion that there should be some visible evidence of data tearing (i.e. the final result is 178592 or something) is false, because there's no reason to expect any such result.

The behavior you're observing can probably be explained by compiler optimizations

The following code

auto inc(int a) {
    for (int k = 0; k < a; ++k)     
        ++counter;
}

Can be optimized legally according to the C++ standard into

auto inc(int a) {
    counter += a;
}

Note how the number of writes into counter has been optimized from O(a) to O(1). That's pretty significant. What this means is that it's possible (and probable) that the write to counter is finishing before the second thread has even been initialized, making the observation of data tearing statistically improbable.

If you want to force this code to behave the way you expect, consider marking the variable counter as volatile:

#include <iostream>
#include <thread>
#include <mutex>

volatile int counter = 0;

auto inc(int a) {
    for (int k = 0; k < a; ++k)     
        ++counter;
}

int main() {

    auto a = std::thread{ inc, 100000 };
    auto b = std::thread{ inc, 100000 };

    a.join();
    b.join();

    std::cout << counter;
    return 0;
}

Bear in mind that this is still undefined behavior, and should not be relied upon in any kind of production-destined code! However, this code is more likely to replicate the race condition you're trying to invoke.

You might also try larger numbers than 100000, since on modern hardware, even without optimizations, a loop of 100000 can be pretty fast.

Community
  • 1
  • 1
Xirema
  • 19,889
  • 4
  • 32
  • 68
  • This is the answer I was looking for! I got troubles in my mind when I saw that C++ did this properly (non-mutex way) while with another language, Delphi, I was able to reproduce the random output numbers due to undefined behavior (still non-mutex way). – Emma Rossignoli Jul 16 '18 at 20:43
  • Also yes with the volatile keyword I dont get the exact number anymore! – Emma Rossignoli Jul 16 '18 at 20:44
  • 1
    Even without the optimization. The cost of creating a thread is high. The cost of running through the loop low. The first thread could simply have finished all the iterations before the second thread had been created. – Martin York Jul 16 '18 at 20:53
  • @MartinYork Hence my last paragraph, where I suggested increasing the numbers involved. It's worth noting though that people tend to overestimate just how "costly" thread creation is. It depends a lot on the OS + underlying hardware, but even if we assume incrementing a variable is a single CPU cycle (unlikely), it's still typical that a thread could be fully created in 100000 CPU cycles. – Xirema Jul 16 '18 at 21:00
  • @Xirema: The comment was not really at you (just for others to give more context). I see you understand the issue. I would say the most costly part about thread creation is allocation of the new stack area. Memory management is the bottleneck here (even if you go to the OS rather than language memory handler). The rest seems relatively trivial. – Martin York Jul 16 '18 at 21:26