1

I understand the basic rules for memory ordering in C++11, especially release-acquire ordering. I have a big chunk of memory shared between two threads, where I do not need atomicity, but want to ensure that eventually all the changes done by a thread is visible in another, especially on platforms with relaxed memory model.

Is it ok to simply use an atomic guard var only to trigger memory synchronization? E.g.,

std::atomic<bool> guardVar;
char *shared_mem=get_shared_mem();

(thread 1)
while(true) {
  happens_many_things();
  do_whatever_I_want_with_shared_mem();
  guardVar.store(0, std::memory_order_release);
}

(in thread 2)
while(true) {
  guardVar.load(std::memory_order_acquire);
  read_shared_mem_no_problem_if_inconsistent();
}

Again, it is not a problem if thread 2 reads a "half-ready" state in the middle of do_whatever_I_want_with_shared_mem(), I just want to ensure that I get all the changes written by thread 1 after a well defined point.

Based on this article it should work, but I do not see solutions like this on the net, and is not easy to test if it really does what I intend.

Is it ok? If it is, is there a more elegant way?

Ferenc
  • 779
  • 1
  • 6
  • 14

1 Answers1

1

it is not a problem if thread 2 reads a "half-ready" state in the middle of do_whatever_I_want_with_shared_mem()

This is an error, you cannot access shared memory by multiple threads if one of them is modifying the data. The C++ standard calls it a data race and it leads to undefined behavior.

Access between the two threads needs to be synchronized, but the way you use the std::atomic is incorrect. The store_release in thread 1 is immediately followed by accessing the same data again. Same for the load_acquire; there is no synchronization between the two operations and hence you are dealing with a data race.

To ensure that your shared memory is only accessed by one thread at a time, guardVar can technically be used like this:

std::atomic<bool> guardVar{false};

(thread 1)
while(true) {

    while (guardVar.exchange(true, std::memory_order_acquire));  // LOCK

    happens_many_things();
    do_whatever_I_want_with_shared_mem();

    guardVar.store(false, std::memory_order_release);  // UNLOCK
}

(in thread 2)
while(true) {

    while (guardVar.exchange(true, std::memory_order_acquire)); // LOCK

    read_shared_mem_no_problem_if_inconsistent();

    guardVar.store(false, std::memory_order_release);  // UNLOCK
}

But since this is using a std::atomic as a mutex in a rather inefficient way (note the spinning), you really should use a std::mutex

Update:

It is still possible to use your shared memory without locking, but then it is your responsibility to ensure that each individual object that is accessed in shared memory is data race free (std::atomic objects qualify).

Then you more or less get the behavior you describe in your question where a second thread may see a "half-ready" state (some objects updated, others are not). Without synchronization the second thread cannot really know when the updates by the first thread are done, but at least it is safe to read/write to the data race free objects concurrently.

LWimsey
  • 6,189
  • 2
  • 25
  • 53
  • Hmm. You are right that this is indeed an undefined behavior. – Ferenc Apr 04 '17 at 21:54
  • Hmm. You are right that this is indeed an undefined behavior by the standard. Imagine that I am simulating a framebuffer, what one thread continuously writes in random access, and another thread wants to periodically read out the whole buffer to process it. Tearing is not a problem. How can I solve this without any kind of locking in neither of the threads? – Ferenc Apr 04 '17 at 22:01
  • @Ferenc That scenario was indeed part of your question.. I'll update the answer – LWimsey Apr 04 '17 at 22:21
  • In the framebuffer scenario I mentioned in my previous comment, it is not really feasible to create ~1M atomic objects, one for each byte in the buffer. Anyway, I accept your answer as you correctly answered my original question. Maybe I should rephrase and ask a new question for this concrete framebuffer case. – Ferenc Apr 05 '17 at 22:36
  • To warn the followers, I have found [this article](https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong) about data races. Now I see two possible solutions: 1) write the reading thread in assembly, and keep the release store semantic in the first loop to force a memory fence instruction, or 2) declare the framebuffer as a volatile array, thus preventing the compiler to optimize such nasty things what are described in the referenced article. Neither options are standards-compliant, but I am out of imagination how the compiler can cheat on me then. :) – Ferenc Apr 05 '17 at 22:40
  • @Ferenc I did not want to present the possibility of using atomic objects as a feasible solution; It was more like a alternative (and valid) approach in case synchronization is not used. You are dealing with shared memory that is accessed by multiple threads, so there are some limitiations to what you can do – LWimsey Apr 05 '17 at 22:53