Why does this spinlock require memory_order_acquire_release instead of just acquire?

Question

// spinlockAcquireRelease.cpp

#include <atomic>
#include <thread>

class Spinlock{
  std::atomic_flag flag;
public:
  Spinlock(): flag(ATOMIC_FLAG_INIT) {}

  void lock(){
    while(flag.test_and_set(std::memory_order_acquire) ); // line 12
  }

  void unlock(){
    flag.clear(std::memory_order_release);
  }
};

Spinlock spin;

void workOnResource(){
  spin.lock();
  // shared resource
  spin.unlock();
}


int main(){

  std::thread t(workOnResource);
  std::thread t2(workOnResource);

  t.join();
  t2.join();

}

In the notes, it is said:

In case more than two threads use the spinlock, the acquire semantic of the lock method is not sufficient. Now the lock method is an acquire-release operation. So the memory model in line 12 [the call to flag.test_and_set(std::memory_order_acquire)] has to be changed to std::memory_order_acq_rel.

Why does this spinlock work with 2 threads but not with more than 2? What is an example code that cause this spinlock to become wrong?

Source: https://www.modernescpp.com/index.php/acquire-release-semantic

I don't think it is required, but I am not confident enough about it for an answer. Also note the comments on the source page, which seem to be equally confused about it. — user17732522, Jan 12 '22 at 04:51
That page has a comment section where you can ask the question directly to the author. — Raymond Chen, Jan 12 '22 at 05:19

LWimsey · Accepted Answer · 2022-01-12T22:30:34.407

4

std::memory_order_acq_rel is not required.

Mutex synchronization is between 2 threads.. one releasing the data and another acquiring it.
As such, it is irrelevant for other threads to perform a release or acquire operation.

Perhaps it is more intuitive (and efficient) if the acquire is handled by a standalone fence:

void lock(){
  while(flag.test_and_set(std::memory_order_relaxed) )
    ;
  std::atomic_thread_fence(std::memory_order_acquire);
}

void unlock(){
  flag.clear(std::memory_order_release);
}

Multiple threads can spin on flag.test_and_set, but one manages to read the updated value and set it again (in a single operation).. only that thread acquires the protected data after the while-loop.

edited Jan 12 '22 at 22:30

answered Jan 12 '22 at 10:48

LWimsey

6,189
2
25
53

Shouldn't the `atomic_thread_fence` be a `memory_order_acq_rel` barrier theoretically (although it should work on all mainstream platform)? Indeed, a crazy platform can theoretically reorder the `test_and_set` after the barrier (I am especially concerned with speculative execution for example). After all, C++11 [did not forbid circular atomic dependencies](https://en.cppreference.com/w/cpp/atomic/memory_order#Relaxed_ordering) (which is crazy too). – Jérôme Richard Jan 12 '22 at 17:37
1

@JérômeRichard In C++11, circular atomic dependencies were only possible (better: not forbidden) with relaxed operations, that is, without `atomic_thread_fence`. C++ defines the 'synchronize-with' relationship, where the acquire part can be a relaxed load followed by an acquire fence. For more details, check the 'fences' paragraph in the C++ standard. It uses different terminology, but it basically says that no operation after the acquire fence can be reordered with the relaxed load sequenced before the fence. – LWimsey Jan 12 '22 at 19:58
So you mean the author is wrong in that statement? – Huy Le Jan 13 '22 at 02:10
1

@HuyLe The statement the author makes about `memory_order_acq_rel` is not correct, but it is a harmless mistake. – LWimsey Jan 13 '22 at 11:36

Why does this spinlock require memory_order_acquire_release instead of just acquire?

1 Answers1