How to multiply two values and store the result atomically?

Question

Say I have the following global variables in my code:

std::atomic<uint32_t> x(...);
std::atomic<uint32_t> y(...);
std::atomic<uint32_t> z(...);

My task is to multiply x and y and then store the result in z:

z = x * y

I'm aware that the naive approach of calling store() and load() on each object is completely wrong:

z.store(x.load() * y.load()); // wrong

This way I'm performing three separate atomic instructions: another thread might slip through and change one of the values in the meantime.

I could opt for a compare-and-swap (CAS) loop, but it would guarantee atomicity only for swapping the old value of z with the new one (x*y): I'm still not sure how to perform the whole operation in a single, atomic step.

I'm also aware that wrapping x, y and z inside a struct and make it atomic is not feasible here, as the struct doesn't fit inside a single 64-bit register. The compiler would employ locks under the hood (please correct me if I'm wrong here).

Is this problem solvable only with a mutex?

It can't be done at the assembly level, so no at C++ level without a mutex. — Michael Chourdakis, Jul 07 '19 at 11:01
Maybe this is a more fruitful question: Why do you need a third atomic variable that is just the result of a computation on two other variables? Point is that whenever you need the product of x and y, you can atomically read the two variables and compute it, unless there's some other magic going on that you didn't write. — Ulrich Eckhardt, Jul 07 '19 at 11:25
@UlrichEckhardt `z`, the variable that holds the result has to be atomic because it is written/read by other threads. — Ignorant, Jul 07 '19 at 11:50
Well, I'm not questioning that. However, I don't understand (and thus dare to challenge) why the variable z has to exist at all! — Ulrich Eckhardt, Jul 07 '19 at 11:58
@UlrichEckhardt good point. So you are suggesting to avoid `z` and use the product directly where needed, right? However, this way you still have the trouble of reading `x` and `y` atomically in a single shot... or not? — Ignorant, Jul 07 '19 at 12:12
Yes you do. But they're 64 bit together, so that should be feasible, right? — Ulrich Eckhardt, Jul 07 '19 at 12:13
You're right indeed, but: what if the variables were `uint64_t` instead, or the operation was way more complex like `x * y * w * k / j`? No nitpicking here, I'm just very curious :) — Ignorant, Jul 07 '19 at 12:27
@Ignorant For an operation with more terms, a solution is transactional memory. An operation is a "transaction" in the database sense, providing ACID guarantees (for the most part). A transaction attempt either commits (in which case all actions appear to be atomic) or fails (in which case changes are rolled back to maintain consistent state) and must be retried (like a CAS loop). TM is an open research topic. Hardware TM support is only just now becoming a reality. Software TM have existed for a while. TBH I don't know how they work, but I'm guessing they use a lot of CAS? — Humphrey Winnebago, Jul 09 '19 at 07:09

Acorn · Accepted Answer · 2019-07-07T12:38:22.127

I'm still not sure how to perform the whole operation in a single, atomic step.

It will only be possible to do so if you architecture supports something like "32-bit atomic multiplication" (and you would have to do it outside the C++ standard's facilities) or an atomic that is wide enough to perform a RMW operation on 64-bits.

I'm also aware that wrapping x, y and z inside a struct and make it atomic is not feasible here, as the struct doesn't fit inside a single 64-bit register.

Even if they would fit, you would still need to do a RMW operation, since it is unlikely you have atomic multiplication anyway.

Is this problem solvable only with a mutex?

If your architecture supports a lock-free 64-bit atomic (check with is_always_lock_free), you can keep both x and y together and perform operations on it as needed.

what if the variables were uint64_t instead, or the operation was way more complex like x * y * w * k / j?

Assuming that your architecture does not have 128-bit lock-free atomics, then you cannot load that much data atomically. Either design your program so that it does not need the (complete) operation to be atomic to begin with, use locks or seek a way to avoid sharing the state.

Note that even if you perceive some operations as atomic, you have to realize that in an SMP system you are putting pressure on the cache hierarchy anyway.

How to multiply two values and store the result atomically?

1 Answers1