2

So in C++, say that I have the following 16 byte type, which can undergo atomic operations on CPUs that have the cmpxchg16b instruction:

#include <atomic>
#include <cstdio>

struct foo
{
    size_t _x;
    void* _y;

    foo(size_t x = 3, void* y = nullptr): _x(x), _y(y){}
};

int main()
{
    std::atomic<foo> f1;
    foo f2;
    foo f3(2, new int(4));
    f1.compare_exchange_strong(f2, f3);
    std::printf("is always lock free %s\n", std::atomic<foo>::is_always_lock_free ? "true" : "false" );
}

However, say that I might want to do an atomic increment only on _x, such as via fetch_add. How do i apply such an atomic operation without having to use std::atomic<size_t> ? The reason I don't want to use that here is because then the larger foo type becomes not trivially copyable, and prevents me from using the 16 byte compare and exchange, giving the following error:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/atomic:923:3: error: 
      _Atomic cannot be applied to type 'foo' which is not trivially copyable
  _Atomic(_Tp) __a_value;

So i am looking for a way to do something like,

foo a1;
fetch_add(&a1._x);
Josh Weinstein
  • 2,788
  • 2
  • 21
  • 38
  • Make sure it's sufficiently aligned. Use C++20 `std::atomic_ref`, or use the same compiler features that it uses (e.g. `__atomic_fetch_add` in GNU C compatible compilers https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html). Or as an unsafe hack, point an `atomic*` at your var. Make sure all potentially-concurrent readers read through an `atomic*`. There was a recent Q&A where this was also the answer, so I think [Alternatives to std::atomic\_ref](https://stackoverflow.com/q/67620813) is a duplicate. – Peter Cordes Jun 08 '21 at 02:41
  • Oh, wait a minute, you want this to be safe on a member of a larger type that's also `atomic<>`. That's problematic; some compilers will just choose to make `foo` non-lockfree. (e.g. MSVC). Also GCC7 and later will say it's not lock_free (because it doesn't have the expected read-only scalability, among other things) , but will still use lock cmpxchg16b inside the libatomic function on CPUs that support that instruction. (not first-gen AMD K8). See [How can I implement ABA counter with c++11 CAS?](https://stackoverflow.com/q/38984153) for more about hacking a `union`. – Peter Cordes Jun 08 '21 at 02:45
  • 1
    I may have been hasty closing this as a duplicate; can you clarify what kind of requirements you have for being able to mix `compare_exchange_strong` on whole `foo` objects with atomic read/write/RMW of one of the members? That would be no problem in x86-64 asm with `lock cmpxchg16b`, but a huge problem to do *portably* in ISO C++, unless `std::atomic_ref` has anything to say about using partially-overlapping references. – Peter Cordes Jun 08 '21 at 02:50
  • @PeterCordes This does not need to be ISO C++, just to have some equivalent amongst gcc, clang, and msvc. – Josh Weinstein Jun 08 '21 at 02:59
  • MSVC handles 16-byte objects as non-lock_free. I don't know if it ultimately ends up using `lock cmpxchg16b` in a library function or not. See [How can I implement ABA counter with c++11 CAS?](https://stackoverflow.com/q/38984153). That would be something to test, otherwise you'll have to roll your own handing of the 16-byte `foo` object, perhaps with [`_InterlockedCompareExchange128`](https://docs.microsoft.com/en-us/previous-versions/ttk2z1ws(v=vs.85)) on MSVC, and `__atomic` on the other compilers. (Make sure to use `-mcx16` with clang, or `-march=something` other than baseline x86-64) – Peter Cordes Jun 08 '21 at 03:00
  • 1
    Also, do you only care about compiling for x86-64? Or also AArch64? It does have LDAXP / [STLXP](https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/STLXP) (load-linked / store-exclusive a pair of 64-bit registers for 128-bit atomic RMWs, but [maybe not ARMv8.1 single-instruction RMW](https://cpufun.substack.com/p/atomics-in-aarch64) at that width.) 32-bit ISAs also often support 64-bit atomic RMW, so things are easier there if compilers use them. – Peter Cordes Jun 08 '21 at 04:02
  • Probably only X86_64 at this point since this is for a potential use in an database, which would primarily run on a server tier machine. But good to know there's some form of equivalent in ARM – Josh Weinstein Jun 08 '21 at 04:24
  • Ok. There's an interesting question here that might not be a duplicate of those existing Q&As, but I think for it to be answerable (and thus ready to reopen) you're going to need to specify more details about what interactions you need to work, and which you can skip. And specify in the question which platforms / compilers you care about. And maybe how much "happens-to-work" you're prepared to accept. Of course the simple / slow ISO C++ way is to use a retry loop trying to CAS in a new `foo` with one member incremented, but compilers won't optimize that to a lock add on the member. – Peter Cordes Jun 08 '21 at 05:04

0 Answers0