About c++ memory order: how to keep other threads to access common resources safely?

Question

This is my code: Godbolt.

#include <atomic>
#include <iostream>
#include <thread>
#include <vector>

int main(int, char **) {
  volatile bool resource = true;

  std::atomic_bool stop{false};
  std::atomic_int working{0};

  std::thread controller([&]() {
    // Inform all workers resource is going to be destroyed.
    stop.store(true, std::memory_order_relaxed); // #1

    // Wait for all existing workers to finish.
    while (working.load(std::memory_order_relaxed)) { // #2
      std::this_thread::yield();
    }

    // Destroy resource.
    resource = false; // #3
  });

  std::vector<std::thread> workers;
  for (int i = 0; i < 64; ++i) {
    workers.emplace_back([&]() {
      working.fetch_add(1, std::memory_order_relaxed); // #4
      if (stop.load(std::memory_order_relaxed)) { // #5
        working.fetch_sub(1, std::memory_order_relaxed); // #6
        return;
      }
      
      // Access resource
      if (!resource) { // #7
        std::cerr << "Data race detected: resource is not available."
                  << std::endl;
        abort();
      }
      working.fetch_sub(1, std::memory_order_relaxed); // #8
    });
  }

  for (auto &worker : workers) worker.join();
  controller.join();

  std::cout << "no data race detected." << std::endl;
  return 0;
}

The case is like this:

Multiple worker threads accessing a common resource(read-only).
One controller thread will destroy the common resource after informing the workers and wait for the existing workers to finish.

This case describes a common scene: Some workers born at any time, but a controller(resource manager) might intend to destory the resource at any time. To avoid data race, the controller need to wait a bit while for current workers to finish and prevent any new workers.

I used several c++ atomics to get it work. But i have no confidence about the memory ordering although it seems to work well on my x86 server.

In the above code, i just use the relaxed ordering which is apparently not enough, but i don't know which ordering is right here.
By the way, are there any websites or tools to test the memory reordering issues among different platforms?

code should be in the question. And often non-working code is not sufficient to explain what the code is supposed to do. You need to explain that too. And apart from the title which is too generic there is no question — 463035818_is_not_an_ai, Jun 29 '23 at 09:23
Note `volatile` is NOT related to threadsafety! It is used for memory locations that can be modified by things other then the processor (e.g. interaction with memory mapped hardware) — Pepijn Kramer, Jun 29 '23 at 09:23
Welcome to Stack Overflow. Please read [the help pages](http://stackoverflow.com/help), take the SO [tour], and read [ask]. Then also please read [how to write the "perfect" question](https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/), especially its [checklist](https://codeblog.jonskeet.uk/2012/11/24/stack-overflow-question-checklist/). Lastly please try to keep your questions *self-contained*, with its own [mre]. — Some programmer dude, Jun 29 '23 at 09:23
As for your problem, if you have a single common resource then I would recommend you create an object with access functions to the resource. These functions uses a single (shared) mutex to lock the access the resource. — Some programmer dude, Jun 29 '23 at 09:25
Lockless programming is incredibly difficult to get right. I mean, a condition variable and a mutex would probably even be more performance than your `while (working.load(...))` loop — selbie, Jun 29 '23 at 09:55
You have initialized `working` to zero, so first thread can terminate before anything else starts. — Marek R, Jun 29 '23 at 10:00
@Someprogrammerdude Thanks for your advice. Shared mutex is indeed suitable here. I just wanna dive deeper into locklockless and c++ memory model. — Monte, Jun 30 '23 at 07:16
@MarekR You are right, but it does not hurt, as long as there are some chances for workers to do run before controller. — Monte, Jun 30 '23 at 07:17
@PepijnKramer That's true, but i think it's irrelevant to this case. — Monte, Jun 30 '23 at 07:18
I think it is irrelevant to this case too, but I see that mistake with `volatile` being made often. And so it distracts from other issues you might have, still better remove it (unless ofcourse there is a hardware reason to use it) — Pepijn Kramer, Jun 30 '23 at 07:20
In typical code like this, the stop flag is really just "advisory" and is fine with relaxed ordering; you then need acquire/release ordering on `working` to ensure that all uses of `resource` happen before its destruction. But your code has an extra subtlety - if a worker should start up after the controller has already observed `working` to equal 0, then it *must not* access the resource at all. The stop flag is the only way it could know this, so its ordering becomes crucial as well. — Nate Eldredge, Jul 02 '23 at 07:31
A simpler version would just initialize `working` to 64, and remove the code where the worker threads increment it. That ensures that we really do wait for every worker to release the resource before destroying it. So you may want to consider which example you would really want to ask about. — Nate Eldredge, Jul 02 '23 at 07:32
To your question #2, [ThreadSanitizer](https://clang.llvm.org/docs/ThreadSanitizer.html) is a valuable tool for catching data races. Unfortunately it misses this one, apparently fooled by the relaxed atomic accesses that don't actually help in any way. I reported it: https://github.com/google/sanitizers/issues/1667 — Nate Eldredge, Jul 03 '23 at 07:11
@selbie: C++20 has `.wait()` and `.notify_one()` member functions for `std::atomic`, wrapping similar mechanisms for OS-assisted sleep/wake such as Linux `futex` that can and should be used instead of spin-wait loops when it wait time is typically not very short. https://en.cppreference.com/w/cpp/atomic/atomic/wait — Peter Cordes, Jul 03 '23 at 16:23
@NateEldredge Thank you, I can understand your suggestion to initialize the `working` to 64, however that is not my intention. Actually, I have simplified the logic of my real code, where the `controller` might exit and destroy resources at any time, and `workers` born randomly. — Monte, Jul 05 '23 at 04:24

score 1 · Answer 1 · answered Jul 03 '23 at 15:38

Briefly: #1, #2, #4, #5 need to be seq_cst. This is the only ordering which prevents StoreLoad reordering (moving a load before a store).

We can see that a data race would potentially occur if #2 were reordered before #1. In that case, it could happen that working.load() returns 0 (all existing workers have finished), but then another worker starts and immediately checks the stop flag (#5), getting the value false. Then it will proceed to access the resource, racing with the controller thread which is now about to destroy it.

Likewise, reordering #4 and #5 also results in a potential data race. Then the worker could see the stop flag as false, but not yet have incremented working to indicate that it has started. If the controller runs at this point, it would proceed to destroy the resource.

Also, #8 needs to be release, since it is essential that it not be reordered before #7. We would similarly argue that #2 needs to be acquire, but in fact it already has to be seq_cst as shown above, which includes all the properties of acquire.

#6 can stay as relaxed. Any thread which reaches #6 is not going to access the resource at all, and so it cannot be involved in a data race.

If I get some time later, I will add a formal proof that these orderings would eliminate the data race.

About c++ memory order: how to keep other threads to access common resources safely?

1 Answers1