2

I'm trying to understand how C11 memory model works and wrote two functions containing expressions that conflict (in the sense of 5.1.2.4(p4)):

struct my_struct{
    uint64_t first;
    int64_t second;
} * _Atomic instance;

void* set_first(void *ignored){
    uint64_t i = 0;
    while(1){
        struct my_struct *ms = atomic_load_explicit(&instance, memory_order_acquire);
        ms -> first = i++;
        atomic_store_explicit(&instance, ms, memory_order_release);
        sleep(1);
    }
}

void* print_first(void *ignored){
    while(1){
        struct my_struct *ms = atomic_load_explicit(&instance, memory_order_acquire);
        uint64_t current = ms -> first;
        char buf[100];
        memset(buf, '\0', sizeof(buf));
        sprintf(buf, "%" PRIu64 "\n", current);
        fputs_unlocked(buf, stdout);
        sleep(2);
    }
}

And the main function:

int main(void){
    struct my_struct tmp = {.first = 0, .second = 0};
    atomic_init(&instance, &tmp);
    printf("main\n");
    pthread_t set_thread;
    pthread_create(&set_thread, NULL, &set_first, NULL);

    pthread_t print_thread;
    pthread_create(&print_thread, NULL, &print_first, NULL);
    while(1){
        sleep(100);
    }
}

So I tried to prove if the program contains no data-races. Here are my thoughts:

  1. We know that a release operation on an atomic object synchronizes with an acquire operation on the object. So atomic_store_explicit(&instance, ms, memory_order_release); in the set_first synchronizes with atomic_load_explicit(&instance, memory_order_acquire) in the print_first.

  2. Since the side effect ms -> first = i++ in the set_first function appeared before the atomic_store_explicit(&instance, ms, memory_order_release); in the program text I assumed it is sequenced-before it (This is I'm not sure about, could not find any normative references).

  3. Combining bullets 1. and 2. implies that ms -> first = i++ inter-thread happens before atomic_load_explicit(&instance, memory_order_acquire); so they are in the happens before relationships.

  4. Applying the data-race definition we conclude that confliting actions uint64_t current = ms -> first; in the print_first function and ms -> first = i++; in the set_first function do not produce data-race.

So the behavior seems to be well-defined.

The doubtful thing is assumed that ms -> first = i++; sequenced before atomic_store_explicit(&instance, ms, memory_order_release); because they occurred one before another in the program text.

Is it correct or the program contains data races?

Some Name
  • 8,555
  • 5
  • 27
  • 77

1 Answers1

3

Modifying non-atomic objects through a non-_Atomic pointer is not inherently data-race UB. (e.g. you could have an algorithm that did int *p = shared_ptr++; to have each thread grab its own slot in a non-atomic array.)

But in this case, you have a clear case of UB, because you have 2 threads accessing main's tmp.first, and they're not both reads.


A store with mo_release is sequenced after any previous stores (and loads), including non-atomic stores like ms->first = .... That's the point of release-stores vs. relaxed.

But the flaw in your reasoning is in step 1: atomic_store_explicit(&instance, ms, memory_order_release) in set_first only synchronizes-with acquire loads that see the value stored! Acquire loads in other threads don't magically wait for a release store that hasn't happened yet. The guarantee is that if/when you do load the value stored by a release store1, you also see all earlier things from that thread.

If the acquire load happens before the release store (in the global order, if there is one), then there's no synchronization with it.

The acquire-load in both threads can happen simultaneously, and then the fox is in the henhouse: ms -> first = i++; and uint64_t current = ms -> first; are running with no synchronization.

It's totally irrelevant that the writing thread will later do a release-store to store the same value back into instance.


Footnote 1: (The "release-sequence" language in the standard extends this to seeing the result of a RMW operation that modified the result of the initial release-store, and so on.)


As far as other threads are concerned, the atomic_load_explicit in set_first is basically irrelevant. You might as well hoist it out of the loop.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I thought atomic types were permitted to have different size and representation from non-atomic. How can this ever not be UB? – R.. GitHub STOP HELPING ICE Apr 13 '19 at 13:58
  • 3
    @R..: The title is tricky. It's an `_Atomic` pointer to a *separate* non-atomic object, with multiple threads dereferencing the same `_Atomic T*` to reach the same `T` object. This is not asking about casting away the `_Atomic` qualifier; see [Assigning pointers to atomic type to pointers to non atomic type](//stackoverflow.com/q/55547081) for that. – Peter Cordes Apr 13 '19 at 14:01
  • 2
    You mean `T *_Atomic`. :-) – R.. GitHub STOP HELPING ICE Apr 13 '19 at 18:14
  • 1
    _only synchronizes-with acquire loads that see the value stored!_ Actually, I re-read the definition of the **synchronizes with** relation and there was the following condition I did not pay attention to (`5.1.2.4/p11`): _and reads a value written by any side effect in the release sequence headed by A._. – Some Name Apr 13 '19 at 19:42
  • 1
    @SomeName: Yes, that's the critical part of the language in the standard that means acquire loads don't have to know about and wait for all possible future release stores. :P (And thus are wait-free, for lockless atomic types.) – Peter Cordes Apr 14 '19 at 00:41
  • Since as defined in `5.1.2.4(p4)` the modification order is a _total order_ relation there would be no data race if `my_struct::first` was declared as `_Atomic uint64_t first;` (total order guarantees no ambiguity on which side-effect affecting the member is visible). – Some Name Apr 14 '19 at 08:15
  • 1
    @SomeName: Yup, atomic pointer to atomic data would have no data-race UB. Your program logic would still have a garden-variety race, with no guarantee of which value the reader would actually see (pre or post increment). A total order would *exist*, but you'd have no guarantee of what that order is; that would be up to the vagaries of hardware and thread scheduling. It would be possible for printing to sometimes skip one increment and sometimes skip 2, if the threads happened to get into very close lock-step. (Unlikely because is prints before sleeping, so will usually sleep for longer.) – Peter Cordes Apr 14 '19 at 08:21
  • I don't really understand the problem with _pre or post increment_. Consider the case if `_Atomic uint64_t first` and the value is stored as `atomic_store_explicit(&first, i++, memory_order_release);`. There is a sequence point after evaluation of `i++` and before the actual call to `atomic_store_explicit(&first, i++, memory_order_release);`. So it means `i++` is sequenced before the call to `atomic_store_explicit(&first, i++, memory_order_release)`. – Some Name Apr 15 '19 at 09:33
  • @SomeName: Huh, what does that have to do with anything? There was never any question about what the storing thread stores, just which store each load in the loading thread sees. Whether it sees the stores from before (pre) storing `i++` or from after (post) storing `i++`. It's not exactly an *increment* of `ms->first`, just a store of an increasing sequence, but same difference for the reader. – Peter Cordes Apr 15 '19 at 10:02