Can two sequential assignment statements in C be executed on hardware out of order?

Question

Given the following C program:

static char vals[ 2 ] = {0, 0};

int main() {

char *a = &vals[0];
char *b = &vals[1];

while( 1 ) {

    SOME_STUFF()

    // non-atomic operations in critical section
    if( SOME_CONDITION() )
        {
        *a = 1;
        *b = 2;
        }
    else
        {
        *a = 0;
        *b = 0;
        }


    SOME_OTHER_STUFF()

    }

return 0;
}

int async_interrupt( void ) {

PRINT( a );
PRINT( b );
}

Is it possible for the hardware to actually load the value 2 into the memory location &vals[1] first, such that an interrupt routine could execute and see vals[1] == 2 and vals[0] == 0?

If this is possible, any description of the load/store operations that would result in this scenario would be much appreciated.

EDIT 1: Added a little more context to the code section. Unfortunately, I don't have the machine code from the compiled source.

Yes; it's also possible for the program to be optimized to `int main() { return 0; }` since it has no observable behaviour — M.M, Nov 19 '18 at 22:52
It would improve the question to post an example of the "interrupt routine" you ask about. It would be undefined behaviour if an interrupt routine tried to access `a` or `b`, generally speaking, so this question might be moot. Also, some platforms provide stronger guarantees for interrupt routines than Standard C does. — M.M, Nov 19 '18 at 23:08
Also UB if the interrupt routine accessed `vals[0]` or `vals[1]`. (`a` and `b` are locals with automatic storage, so there's no good way for an interrupt to get them. Not sure what the point of them is.) — Peter Cordes, Nov 19 '18 at 23:23

score 10 · Answer 1 · answered Nov 19 '18 at 23:07

10

C doesn't run on hardware directly. It has to be compiled first.

The specifics of undefined behaviour (like unsynchronized reads of non-atomic variables) totally depend on the implementation (including compile-time reordering in the compiler, and depending on the target CPU architecture, the runtime reordering rules of the that ISA).

Reads/writes of non-atomic variables are not considered an observable side-effect in C or C++, so they can be optimized away and reordered up to the limit of preserving the behaviour of the program as a whole (except when the program has undefined behaviour- optimizations can do anything in that case even if the compiler can't "see" there will be UB when it's compiling.)

See also https://preshing.com/20120625/memory-ordering-at-compile-time/

answered Nov 19 '18 at 23:07

Peter Cordes

328,167
45
605
847

2

As I commented in the thread under your answer: **C only guarantees that causality applies within a single thread.** The OP wants to read `vals[0]` and `vals[1]` *from an interrupt handler*, which runs asynchronously from the main thread so C doesn't guarantee anything about what it will find if it reads `vals[0..1]` without synchronization, and without that array being `_Atomic`. The whole point of `_Atomic` is to guarantee causality in cases like this that aren't synchronous single-threaded execution. Your answer arguing based on causality for non-atomic C variables is misleading at best. – Peter Cordes Nov 19 '18 at 23:43
@EdwinBuck: Was about to reply in the thread under your answer- I think you missed that the OP really did say "such that an interrupt routine could ...". So yes, we are talking about a case that goes outside the bounds of what C's as-if rule requires any code-transformations to preserve. Since you deleted your answer, I'm guessing you noticed that in the question now :) – Peter Cordes Nov 19 '18 at 23:50
Peter, if we assume that the C is compiled and linked such that the machine code is not optimized, do you know of any hardware nuances that could have somehow reordered the loading/storing of those values, such that the interrupt sees the second assigned, but not the first? – leo1 Nov 23 '18 at 14:36
@leo1: assuming a compiler like GCC where un-optimized means all variables are treated similar to `volatile`, then for an interrupt on the *same* core that was running the main thread, no, not on a normal mainstream CPU architecture. That's kind of pointless and un-interesting, though. You'd never want to use un-optimized code in production. The Mill CPU architecture has stores that don't become visible (even to itself) for multiple cycles, allowing explicit parallelism, but I can't think of a reason why a compiler would use a longer delay for the first store in fully un-optimized code. – Peter Cordes Nov 23 '18 at 17:41
@PeterCordes, there are many reasons for using non-optimized code in production (e.g. safety critical flight software, automotive software, any environment that doesn't desire to introduce compiler bugs). But, that's besides the point. – leo1 Nov 25 '18 at 22:58
@EdwinBuck "_thus they are required to be translated into code segments which preserve the ordering_" What is a "code segment"? When does it start? – curiousguy Nov 27 '18 at 17:19
1

@curiousguy: I assume he meant "basic blocks", or just "blocks"/chunks of asm, like the definition for a whole function. Note that Edwin's deleted his misleading answer after I replied (but not the comment), so I don't think we need to pick at it any farther. – Peter Cordes Nov 27 '18 at 20:19
1

Comment deleted to assist in clarity – Edwin Buck Nov 28 '18 at 01:28

BeeOnRope · Accepted Answer · 2018-11-23T23:40:02.563

4

Yes, it is possible because the compiler might re-order those statements as described in Peter's answer.

However, you might still be wondering about the other half: what hardware can do. Under the assumption that your stores end up in the assembly in the order you show in your source¹, if an interrupt occurs on the same CPU that is running this code, from within the interrupt you'll see everything in a consistent order. That is, from within the interrupt handler, you'll never see the second store having completed, but the first not. The only scenarios you'll see are both not having completed, both completed or the first having completed and the second not.

If multiple cores are involved, and the interrupt may run on a different core, then you simply the classic cross-thread sharing scenarios, whether it is an interrupt or not - and what the other core can observe depends on the hardware memory model. For example, on the relatively strongly ordered x86, you would always observe the stores in order, where as on the more weakly ordered ARM or POWER memory models you could see the stores out of order.

In general, however, the CPU may be doing all sorts of reordering: the ordering you see within an interrupt handler is a special case where the CPU will restore the appearance of sequential execution at the point of handling the interrupt. The same is true of any case where a thread observes its own stores. However, when stores are observed by a different thread - what happens then depends on the hardware memory model, which varies a lot between architectures.

¹ Assuming also that they show up separately - there is nothing stopping a smart compiler from noticing you are assigning to adjacent values in memory and hence transforming the two stores into a single wider one. Most compilers can do this in at least some scenarios.

edited Nov 23 '18 at 23:40

answered Nov 20 '18 at 03:03

BeeOnRope

60,350
16
207
386

I believe this is what I'm looking for "from within the interrupt handler, you'll never see the second store having completed, but the first not.". However, my limited understanding of architectures, I was thinking that second store could possibly happen first due to some hardware nuances. Maybe that isn't the case. – leo1 Nov 23 '18 at 14:32
1

@leo - an interrupt that actually interrupts the code in question (running on the same CPU that the code in question was running on) will always see a consistent view of the stores, just like code running on the CPU will see it's stores in source order. If another CPU is involved, then interrupt code (or any code really) running on that second CPU _concurrently_ with the code doing the stores may see them out of order, depending on the hardware memory model. – BeeOnRope Nov 23 '18 at 15:16
@curiousguy - I'm not quite sure what you are referring to. If multiple threads are used and there is more than one CPU, and interrupts are not involved, then sure you can also see an inconsistent store order, depending on the hardware. Maybe you could clarify your question or create a separate one. – BeeOnRope Nov 27 '18 at 17:38
@curiousguy - I'm considering the case where an interrupt occurs as specified by the OP and show in their `main()` method. So I assume there are two execution contexts: the normal user context (single thread) running the stores, and the interrupt context, which may occur on the same CPU or a different CPU as the other context. When you ask about multiple threads, do you mean multiple threads running the store method? Multiple overlapping interrupts? It is not clear to me, since the original question doens't obviously involve threads. – BeeOnRope Nov 27 '18 at 17:49
@BeeOnRope So you are saying that without any thread creation, the interrupt can cause the signal handler to run in a different execution context, on a different CPU/core, as if by another thread? That's new to me and strange. – curiousguy Nov 27 '18 at 19:22
1

@curiousguy - no, I'm not trying to say that (systems that I'm _aware of_ will deliver the signal to the only thread if there is only one). In fact, I'm trying to more or less sidestep the whole issue of _signal_ handling semantics, and answer the question as asked, with any necessary caveats. Note that the OP didn't even talk about signals, but simply "interrupts". I don't know what type of system they are using, or how these interrupts are delivered. Originally I assumed the interrupt would actually interrupt the code in question, I wrote my answer that way (as in "you'll be fine"). – BeeOnRope Nov 27 '18 at 20:55
However, it occurred to me that this was incomplete: in the scenario where the interrupt runs _concurrently_ with the code doing the stores, it could certainly see them out of order. Multiple threads is an easy way to get interrupts (e.g., signals) concurrent with other code, but I doubt it is the _only_ way. If you are interested in digging further into interrupt or signal handler semantics, I recommend another question where you present your specific query and include details of the OS and hardware that interests you. – BeeOnRope Nov 27 '18 at 20:57

Can two sequential assignment statements in C be executed on hardware out of order?

2 Answers2