How to atomically perform sequential load and store operations?

Question

Consider this code under GCC 4.8.0:

std::atomic<bool> a;
std::atomic<bool> b;

a.store( b.load() ); // want to be atomic

How can I make the line above to be atomic as whole? In other words, how to obtain atomic assignment of atomic variables?

Are there any alternatives for std::atomic which allow this?

I have found __transaction_atomic {/* any code goes here */} which is activated on GCC by -fgnu-tm. With this, one can write anything in the block and it will be performed atomically.

Now the question are:

Is __transaction_atomic implemented with mutexes? If yes, then what the mutex actually locks?

Does the implementation of __transaction_atomic change depending on what is in it's block? If yes, then how it changes?

I do not think that is possible. I do not think it is useful to have such operation. Why do you want it? — wilx, Jul 28 '13 at 21:15
If it was possible to implement this assignment atomically in the C++11 memory model, then `std::atomic` would have a copy assignment operator to do so. — Casey, Jul 28 '13 at 21:18
`__transaction_atomic` is implemented using [software transactional memory](http://en.wikipedia.org/wiki/Software_Transactional_Memory). STM may or may not use locks in its implementation - this is not really relevant as the overhead of its use in such a trivial case as this is certainly enormous, no matter how it's implemented. — JohannesD, Jul 29 '13 at 13:01
@JohannesD If the overhead of using `__transaction_atomic` is enormous for such trivial example, then what is the best alternative for this trivial example? — Vahagn, Jul 29 '13 at 15:31

wilx · Answer 1 · 2013-07-30T12:43:58.857

I do not think that is possible. I do not think it is useful to have such operation. Why do you want it? If you have such hard requirement then you should just use std::mutex locked around a = b assignment.

UPDATE

I have tested the __transaction_atomic block with Cygwin64's GCC 4.8.1 and this very short source

extern int a, b;

void foo ()
{ 
    __transaction_atomic
    {
        a = b;
    }
}

results int oodles of instruction calling ITM library functions

_Z3foov:
.LFB0:
    pushq   %rdi     #
    .seh_pushreg    %rdi
    pushq   %rsi     #
    .seh_pushreg    %rsi
    subq    $200, %rsp   #,
    .seh_stackalloc 200
    movaps  %xmm6, 32(%rsp)  #,
    .seh_savexmm    %xmm6, 32
    movaps  %xmm7, 48(%rsp)  #,
    .seh_savexmm    %xmm7, 48
    movaps  %xmm8, 64(%rsp)  #,
    .seh_savexmm    %xmm8, 64
    movaps  %xmm9, 80(%rsp)  #,
    .seh_savexmm    %xmm9, 80
    movaps  %xmm10, 96(%rsp)     #,
    .seh_savexmm    %xmm10, 96
    movaps  %xmm11, 112(%rsp)    #,
    .seh_savexmm    %xmm11, 112
    movaps  %xmm12, 128(%rsp)    #,
    .seh_savexmm    %xmm12, 128
    movaps  %xmm13, 144(%rsp)    #,
    .seh_savexmm    %xmm13, 144
    movaps  %xmm14, 160(%rsp)    #,
    .seh_savexmm    %xmm14, 160
    movaps  %xmm15, 176(%rsp)    #,
    .seh_savexmm    %xmm15, 176
    .seh_endprologue
    movl    $43, %edi    #,
    xorl    %eax, %eax   #
    call    _ITM_beginTransaction    #
    testb   $2, %al  #, tm_state.4
    je  .L2  #,
    movq    .refptr.b(%rip), %rax    #, tmp67
    movl    (%rax), %edx     # b, b
    movq    .refptr.a(%rip), %rax    #, tmp66
    movl    %edx, (%rax)     # b, a
    movaps  32(%rsp), %xmm6  #,
    movaps  48(%rsp), %xmm7  #,
    movaps  64(%rsp), %xmm8  #,
    movaps  80(%rsp), %xmm9  #,
    movaps  96(%rsp), %xmm10     #,
    movaps  112(%rsp), %xmm11    #,
    movaps  128(%rsp), %xmm12    #,
    movaps  144(%rsp), %xmm13    #,
    movaps  160(%rsp), %xmm14    #,
    movaps  176(%rsp), %xmm15    #,
    addq    $200, %rsp   #,
    popq    %rsi     #
    popq    %rdi     #
    jmp _ITM_commitTransaction   #
    .p2align 4,,10
.L2:
    movq    .refptr.b(%rip), %rcx    #,
    call    _ITM_RU4     #
    movq    .refptr.a(%rip), %rcx    #,
    movl    %eax, %edx   # D.2368,
    call    _ITM_WU4     #
    call    _ITM_commitTransaction   #
    nop
    movaps  32(%rsp), %xmm6  #,
    movaps  48(%rsp), %xmm7  #,
    movaps  64(%rsp), %xmm8  #,
    movaps  80(%rsp), %xmm9  #,
    movaps  96(%rsp), %xmm10     #,
    movaps  112(%rsp), %xmm11    #,
    movaps  128(%rsp), %xmm12    #,
    movaps  144(%rsp), %xmm13    #,
    movaps  160(%rsp), %xmm14    #,
    movaps  176(%rsp), %xmm15    #,
    addq    $200, %rsp   #,
    popq    %rsi     #
    popq    %rdi     #
    ret
    .seh_endproc
    .ident  "GCC: (GNU) 4.8.1"
    .def    _ITM_beginTransaction;  .scl    2;  .type   32; .endef
    .def    _ITM_commitTransaction; .scl    2;  .type   32; .endef
    .def    _ITM_RU4;   .scl    2;  .type   32; .endef
    .def    _ITM_WU4;   .scl    2;  .type   32; .endef
    .section    .rdata$.refptr.b, "dr"
    .globl  .refptr.b
    .linkonce   discard
.refptr.b:
    .quad   b
    .section    .rdata$.refptr.a, "dr"
    .globl  .refptr.a
    .linkonce   discard
.refptr.a:
    .quad   a

This was with -O3 option.

Thanks for the update. I am not familiar Assembler much, so it's hard for me to derive the answers to my questions from the Assembler code you typed . Maybe you can explain what it actually does in more details? — Vahagn, Jul 29 '13 at 11:37
@Vahagn: Well, generally, most atomic operation can be expressed either by single instruction or sequences under ten instructions. What you see above is huge and calls into library functions. So, when you compare this to, e.g., CAS, it is IMHO at least one order of magnitude more complex and slower. — wilx, Jul 29 '13 at 11:49
Agree, but can we find out form the Assembler code if it is implemented using a mutex and if it is not, then which one is preferable: to manually use a mutex or to use __transaction_atomic? — Vahagn, Jul 29 '13 at 15:27
@Vahagn: See for yourself. The sources to libitm are there: http://gcc.gnu.org/viewcvs/gcc/trunk/libitm/ — wilx, Jul 30 '13 at 12:44

score 2 · Accepted Answer · answered Jul 28 '13 at 21:44

2

In theory atomic variable swap could be implemented on a few CPUs with DCAS support. In practice no modern CPU has DCAS support, so it's not possible.

answered Jul 28 '13 at 21:44

Krzysztof Narkowicz

441
3
4

How to atomically perform sequential load and store operations?

2 Answers2

UPDATE