Does an atomic acquire synchronize with mutex lock release?

Question

I have an object that stores some settings in an unordered_map with string keys and variant values. As my library might be used from several threads and reads will very likely outnumber writes by a large margin, I've thought about a copy on write implementation where the "get" operation is lock free and the "put" operation is in a critical section, as in the example:

class Cfg {
    using M = unordered_map<string,X>;
    shared_ptr<const M> data;
    mutex write_lock;
public:
    X get(string key) {
        shared_ptr<const M> cur_ver = atomic_load_explicit(&data, memory_order_acquire);
        // Extract the value from the immutable *cur_ver
    }
    void put(string key, X value) {
        lock<muted> wlock(write_lock);
        // No need for the atomic load here because of the lock
        shared_ptr<const M> cur_ver = data;
        shared_ptr<const M> new_ver = ;// create new map with value included
        // QUESTION: do I need this store to be atomic? Is it even enough?
        atomic_store_explicit(&data, new_ver, memory_order_release);
    }
}

I am reasonably confident that the design works, as long as the acquire/release synchronization affects also the pointed-to data and not just the pointer value. However, my questions are as follows:

Is the atomic store within the lock required for this to work?
Or will the atomic acquire synchronize with the mutex unlock which is a "release" operation?

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

It's required if you want your get function to always return the latest value. It can occur that you have multiple reads and write occurs in the same clock time. Using atomic memory order ensures the order that write is before read.

If you mix non-atomic stores and atomic loads, it is undefined behavior. This thread also discussed it. You potentially have one write after another write. You can have data race if you use non-atomic instruction.

According cppreference

memory_order_acquire

The operation is ordered to happen once all accesses to memory in the releasing thread (that have visible side effects on the loading thread) have happened.

memory_order_release

The operation is ordered to happen before a consume or acquire operation, serving as a synchronization point for other accesses to memory that may have visible side effects on the loading thread.

I don't specifically need other threads to read the latest version guaranteed every time, but I would like to avoid UB. From your answer I gather that the mutex unlock operation does not establish a happens-before with the atomic load, so if I made the very last line non atomic, so I have to keep it. — Javier Martín, Jun 09 '20 at 00:41

score 1 · Accepted Answer · answered Jun 09 '20 at 09:00

will the atomic acquire synchronize with the mutex unlock which is a "release" operation?

No, in order for an acquire operation to synchronize-with a release operation, the acquire operation has to observe the changes of the release operation (or some change in a potential release sequence headed by that operation).

So yes, you need the atomic store inside the lock. There is no guarantee that get will "see" the latest value from put since you only use acquire/release, so there is not total order between the store and load operations. If you want that guarantee you have to use memory_order_seq_cst.

As a side-note - this implementation is most likely not lock-free, because in most library implementations atomic_load_explicit for shared_ptr is not lock-free. The problem is that you have to load the pointer and dereference that pointer to increment the ref-counter, in one atomic operation. This is not possible on most architectures, so atomic_load_explicit is usually implemented using a lock.

So, if I understand this correctly, without the `atomic_store` it does not work, and even with it it's possible that the reader sees an old version due to the lack of `seq_cst`... I am fine with the latter, since the copy-on-write implementation means that the reader will see _some_ consistent version. However, regarding the lack of a atomic lock-free operation in the shared_ptr, would it be better to just use a lock for reading too? — Javier Martín, Jun 09 '20 at 09:39
Yes, without the `atomic_store` it does not work (correctly). I don't think you would gain much by making the operations seq-cst, I just mentioned it for completeness. The critical section for acquiring the `shared_ptr` is very short, so I would expect it to scale better than performing all map operations under a lock, but you should test this yourself. Alternatively you can use a RW-Lock or a concurrent hashmap (for example my [xenium](https://github.com/mpoeter/xenium) library provides some). — mpoeter, Jun 09 '20 at 11:17

Does an atomic acquire synchronize with mutex lock release?

2 Answers2