10

Is it possible to perform atomic and non-atomic ops on the same memory location?

I ask not because I actually want to do this, but because I'm trying to understand the C11/C++11 memory model. They define a "data race" like so:

The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. -- C11 §5.1.2.4 p25, C++11 § 1.10 p21

Its the "at least one of which is not atomic" part that is troubling me. If it weren't possible to mix atomic and non-atomic ops, it would just say "on an object which is not atomic."

I can't see any straightforward way of performing non-atomic operations on atomic variables. std::atomic<T> in C++ doesn't define any operations with non-atomic semantics. In C, all direct reads/writes of an atomic variable appear to be translated into atomic operations.

I suppose memcpy() and other direct memory operations might be a way of performing a non-atomic read/write on an atomic variable? ie. memcpy(&atomicvar, othermem, sizeof(atomicvar))? But is this even defined behavior? In C++, std::atomic is not copyable, so would it be defined behavior to memcpy() it in C or C++?

Initialization of an atomic variable (whether through a constructor or atomic_init()) is defined to not be atomic. But this is a one-time operation: you're not allowed to initialize an atomic variable a second time. Placement new or an explicit destructor call could would also not be atomic. But in all of these cases, it doesn't seem like it would be defined behavior anyway to have a concurrent atomic operation that might be operating on an uninitialized value.

Performing atomic operations on non-atomic variables seems totally impossible: neither C nor C++ define any atomic functions that can operate on non-atomic variables.

So what is the story here? Is it really about memcpy(), or initialization/destruction, or something else?

Josh Haberman
  • 4,170
  • 1
  • 22
  • 43
  • How about invoking the destructor? – 5gon12eder Jan 31 '16 at 02:02
  • Nothing about memcpy is atomic. – Jeff Hammond Jan 31 '16 at 02:53
  • 1
    Your characterization of C11 atomics is wrong according to http://en.cppreference.com/w/c/atomic... – Jeff Hammond Jan 31 '16 at 03:03
  • 1
    @5gon12eder: hmm, interesting idea. If an atomic op is racing with `delete` this would be invalid for other reasons, but perhaps placement new and/or an explicit destructor call would qualify as non-atomic operations on an atomic variable. – Josh Haberman Jan 31 '16 at 03:13
  • @Jeff of course memcpy() isn't atomic, I mentioned memcpy() as a potential way of performing a *non-atomic* operation on an atomic variable. – Josh Haberman Jan 31 '16 at 03:15
  • 2
    @Jeff I'm not sure what you are saying I've mischaracterized. In any case, cppreference is a secondary source, and I think it helps to discuss matters of correctness in the context of the standards documents instead. – Josh Haberman Jan 31 '16 at 03:16
  • @JoshHaberman You said "The C11 standard doesn't appear to require that the argument you pass points to an atomic variable", which is not correct according to Section 7.17.1 of [N1570](http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf), where it says "An A refers to one of the atomic types." – Jeff Hammond Jan 31 '16 at 03:23
  • @Jeff So it does. Thanks for the correction on that part. – Josh Haberman Jan 31 '16 at 03:50
  • So what is the exact question here? – David Haim Jan 31 '16 at 09:56
  • @DavidHaim Is it possible to mix atomic/non-atomic ops on the same memory, and if so, how? From what I can tell right now, the answer is: atomic ops on non-atomic vars is impossible, non-atomic ops on atomic vars include: initialization, memcpy()/memmove()/etc, and possibly placement new / placement destruct. – Josh Haberman Jan 31 '16 at 19:01
  • @DavidHaim If the C++ standard didn't explicitly say that mixing an atomic and non-atomic operation was UB, would there be any way to mix them that didn't already result in UB anyway? Atomic types only support atomic operations and non-atomic types only support non-atomic operations, so how could you mix them without doing something that's already UB? – David Schwartz Feb 02 '16 at 20:16

4 Answers4

1

I think you're overlooking another case, the reverse order. Consider an initialized int whose storage is reused to create an std::atomic_int. All atomic operations happen after its ctor finishes, and therefore on initialized memory. But any concurrent, non-atomic access to the now-overwritten int has to be barred as well.

(I'm assuming here that the storage lifetime is sufficient and plays no role)

I'm not entirely sure because I think that the second access to int would be invalid anyway as the type of the accessing expression int doesn't match the object's type at the time (std::atomic<int>). However, "the object's type at the time" assumes a single linear time progression which doesn't hold in a multi-threaded environment. C++11 in general has that solved by making such assumptions about "the global state" Undefined Behavior per se, and the rule from the question appears to fit in that framework.

So perhaps rephrasing: if a single memory location contains an atomic object as well as a non-atomic object, and if the destruction of the earliest created (older) object is not sequenced-before the creation of the other (newer) object, then access to the older object conflicts with access to the newer object unless the former is scheduled-before the latter.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Interesting theory. I'm not sure I buy it -- it's hard to believe that the idea of two different objects both sort-of existing at the same memory in some kind of Schrödinger sense could possibly be considered defined behavior, such that this extra specificity in the definition of "data race" is necessary to make it undefined. – Josh Haberman Feb 02 '16 at 22:18
  • @JoshHaberman: Well, you need a rule to make it Undefined. The normal rule is that two independent threads have their own timelines, and events on each timeline are only sequenced relative to other events on that same timeline. – MSalters Feb 02 '16 at 23:08
  • After reflecting on this some more, this seems like the most plausible answer. It is similar to the "`atomic_init()` is not atomic" answer, but more general. If we consider that object lifetime beginning and ending is not atomic, then it would make sense that you need to ensure "happens before" for it. – Josh Haberman Feb 10 '16 at 17:17
  • `memcpy()` still seems like another plausible alternative, though it's not clear to me whether `memcpy()` of atomic variables is defined behavior. – Josh Haberman Feb 10 '16 at 17:18
0

disclaimer: I am not a parallelism guru.

Is it possible to mix atomic/non-atomic ops on the same memory, and if so, how?

you can write it in the code and compile, but it will probably yield undefined behaviour.

when talking about atomics, it is important to understand what kind o problems do they solve.

As you might know, what we call in shortly "memory" is multi-layered set of entities which are capable to hold memory.
first we have the RAM, then the cache lines , then the registers.

on mono-core processors, we don't have any synchronization problem. on multi-core processors we have all of them. every core has it own set of registers and cache lines.

this casues few problems.

First one of them is memory reordering - the CPU may decide on runtime to scrumble some reading/writing instructions to make the code run faster. this may yield some strange results that are completly invisible on the high-level code that brought this set of instruction.
the most classic example of this phenomanon is the "two threads - two integer" example:

int i=0;
int j=0;
thread a -> i=1, then print j
thread b -> j=1 then print i;

logically, the result "00" cannot be. either a ends first, the result may be "01", either b ends first, the result may be "10". if both of them ends in the same time, the result may be "11". yet, if you build small program which imitates this situtation and run it in a loop, very quicly you will see the result "00"

another problem is memory invisibility. like I mentioned before, the variable's value may be cached in one of the cache lines, or be stored in one of the registered. when the CPU updates a variables value - it may delay the writing of the new value back to the RAM. it may keep the value in the cache/regiter because it was told (by the compiler optimizations) that that value will be updated again soon, so in order to make the program faster - update the value again and only then write it back to the RAM. it may cause undefined behaviour if other CPU (and consequently a thread or a process) depends on the new value.

for example, look at this psuedo code:

bool b = true;
while (b) -> print 'a'
new thread -> sleep 4 seconds -> b=false;

the character 'a' may be printed infinitly, because b may be cached and never be updated.

there are many more problems when dealing with paralelism.

atomics solves these kind of issues by (in a nutshell) telling the compiler/CPU how to read and write data to/from the RAM correctly without doing un-wanted scrumbling (read about memory orders). a memory order may force the cpu to write it's values back to the RAM, or read the valuse from the RAM even if they are cached.

So, although you can mix non atomics actions with atomic ones, you only doing part of the job.

for example let's go back to the second example:

atomic bool b = true;
while (reload b) print 'a'
new thread - > b = (non atomicly) false. 

so although one thread re-read the value of b from the RAM again and again but the other thread may not write false back to the RAM.

So although you can mix these kind of operations in the code, it will yield underfined behavior.

David Haim
  • 25,446
  • 3
  • 44
  • 78
  • I'm specifically asking about atomic/non-atomic mixing that does *not* invoke undefined behavior. I submit that if all mixing was undefined behavior, the definition of "data race" would be different. See details in my question. – Josh Haberman Feb 01 '16 at 09:43
0

I'm interested in this topic since I have code in which sometimes I need to access a range of addresses serially, and at other times to access the same addresses in parallel with some way of managing contention.

So not exactly the situation posed by the original question which (I think) implies concurrent, or nearly so, atomic and non atomic operationsin parallel code, but close.

I have managed by some devious casting to persuade my C11 compiler to allow me to access an integer and much more usefully a pointer both atomically and non-atomically ("directly"), having established that both types are officially lock-free on my x86_64 system. That is that the sizes of the atomic and non atomic types are the same.

I definitely would not attempt to mix both types of access to an address in a parallel context, that would be doomed to fail. However I have been successful in using "direct" syntax operations in serial code and "atomic" syntax in parallel code, giving me the best of both worlds of the fastest possible access (and much simpler syntax) in serial, and safely managed contention when in parallel.

So you can do it so long as you don't try to mix both methods in parallel code and you stick to using lock-free types, which probably means up to the size of a pointer.

0

I'm interested in this topic since I have code in which sometimes I need to access a range of addresses serially, and at other times to access the same addresses in parallel with some way of managing contention.

So not exactly the situation posed by the original question which (I think) implies concurrent, or nearly so, atomic and non atomic operations in parallel code, but close.

I have managed by some devious casting to persuade my C11 compiler to allow me to access an integer and much more usefully a pointer both atomically and non-atomically ("directly"), having established that both types are officially lock-free on my x86_64 system. My, possibly simplistic, interpretation of that is that the sizes of the atomic and non atomic types are the same and that the hardware can update such types in a single operation.

I definitely would not attempt to mix both types of access to an address in a parallel context, i think that would be doomed to fail. However I have been successful in using "direct" syntax operations in serial code and "atomic" syntax in parallel code, giving me the best of both worlds of the fastest possible access (and much simpler syntax) in serial, and safely managed contention when in parallel.

So you can do it so long as you don't try to mix both methods in parallel code and you stick to using lock-free types, which probably means up to the size of a pointer.