3

Suppose INTENABLE is a microcontroller's register that enables/disables interrupts, and I have it declared somewhere in my libraries as a volatile variable located at the appropriate address. my_var is some variable that is modified within one or more interrupts, as well as within my_func.

Within my_func I would like to do some operation in my_var that reads and then writes (such as +=) atomically (in the sense that it must happen entirely after or before an interrupt - an interrupt cannot occur while it is going on).

What I would usually have then is something like this:

int my_var = 0;

void my_interrupt_handler(void)
{
    // ...

    my_var += 3;

    // ... 
}

int my_func(void)
{
    // ...

    INTENABLE = 0;
    my_var += 5;
    INTENABLE = 1;

    // ...
}

If I'm understanding things correctly, if my_var was declared volatile, then my_var would be guaranteed to be "cleanly" updated (which is to say that the interrupt would not update my_var inbetween my_func's read and write of it) because the C standard guarantees that volatile memory accesses happen in order.

The part I would like some confirmation on is when it is not declared volatile. Then, the compiler will not guarantee that the update happens with interrupts disabled, is that correct?

I'm wondering because I have written similar code (with non-volatile variables), with the difference that I disable interrupts through a function from another compilation unit (some library's file). If I am understanding things correctly, the likely actual reason that worked was that the compiler cannot assume the variable is not read or modified by calls outside the compilation unit. Therefore, if, say, I compiled with GCC's -flto, reordering outside the critical region (bad things) could happen. Do I have this right?


EDIT:

Thanks to Lundin's comment I realized in my head I had mixed together the case where I disable a peripheral's interrupt register with the case where I use a specific assembly instruction to disable all interrupts on the processor.

I would imagine the instruction that enables/disables processor interrupts would prevent other instructions from being reordered from before to after or from after to before itself, but I still do not know for sure whether that is true.

EDIT 2:

Regarding volatile accesses: because I wasn't clear on whether reordering around volatile accesses was something not allowed by the standard, something that was allowed but didn't happen in practice, or something that was allowed and did happen in practice, I came up with a small test program:

volatile int my_volatile_var;

int my_non_volatile_var;

void my_func(void)
{
    my_volatile_var = 1;
    my_non_volatile_var += 2;
    my_volatile_var = 0;
    my_non_volatile_var += 2;
}

Using arm-none-eabi-gcc version 7.3.1 to compile with -O2 for a Cortex-M0 (arm-none-eabi-gcc -O2 -mcpu=cortex-m0 -c example.c) I get the following assembly:

movs    r2, #1
movs    r1, #0
ldr     r3, [pc, #12]   ; (14 <my_func+0x14>)
str     r2, [r3, #0]
ldr     r2, [pc, #12]   ; (18 <my_func+0x18>)
str     r1, [r3, #0]
ldr     r3, [r2, #0]
adds    r3, #4
str     r3, [r2, #0]
bx      lr

Where you can clearly see the two my_non_volatile_var += 2 were merged into a single instruction which happens after both volatile accesses. This means that GCC does indeed reorder when optimizing (and I'm going to go ahead and assume this means it is allowed by the standard).

curiousguy
  • 8,038
  • 2
  • 40
  • 58
tlongeri
  • 122
  • 1
  • 7
  • There won't be a guarantee for `my_var` to even exist. – Eugene Sh. Dec 13 '18 at 18:00
  • What's the -flto flag? – Philip Dec 13 '18 at 18:14
  • It's kind of hard to speculate about this, as all you appear to have is another volatile as protection mechanism. Where and how are interrupts disabled? Are we to replace `INTENABLE` with a clear/set of the global interrupt mask, or what? – Lundin Dec 13 '18 at 18:15
  • @EugeneSh. my_var is a global variable. Why it is not guaranteed ? – Angen Dec 13 '18 at 18:15
  • @Angen Because if it has no observable side effect, it can get easily optimized away. – Eugene Sh. Dec 13 '18 at 18:16
  • @Lundin Yes, `INTENABLE` is meant to be some sort of global interrupt mask, that is what I meant by "Suppose `INTENABLE` is a microcontroller's register that enables/disables interrupts". I'm sorry if that wasn't clear – tlongeri Dec 13 '18 at 18:29
  • Well, there's a bit of a difference if it is a register disabling interrupts of a specific hardware peripheral, or if it is the global interrupt mask. Because in case of the latter, it would have to be something boiling down to inline assembler, and a sane compiler can't really re-order or optimize out inline assembler. – Lundin Dec 13 '18 at 18:34
  • @EugeneSh. Well, I didn't mean for that code snipper to be the entirety of the code. I will update my question to better reflect that some interrupts modify it, but assuming that `my_var` is somehow relevant to the code my concern is that it is not modified inbetween the read and write of the `+=` operation (or, well, that the observable behavior is that it wasn't altered inbetween them). – tlongeri Dec 13 '18 at 18:36
  • @TAL33 My comment still holds. The compiler is free to eliminate this variable as long as a calculation it is used in is still producing the same result. – Eugene Sh. Dec 13 '18 at 18:37
  • @Lundin Ah, of course. I dumbly had mixed up both cases. So what I said would apply if I am disabling interrupts at some peripheral, but likely not if I am doing it through the processor's register. That was very helpful, thank you! – tlongeri Dec 13 '18 at 18:45
  • @Philip My basic understanding is that it basically optimizes across compilation unit (preprocessed file) boundaries (i.e. can optimize while seeing the entire code, instead of just one file at a time). But don't take my word for it, I think you can find a better explanation by googling. – tlongeri Dec 13 '18 at 18:55
  • `volatile` does not guarantee atomicity at all! Well, such things are left to the implementation anyway. But standard does not guarantee it. – Antti Haapala -- Слава Україні Dec 13 '18 at 19:46
  • @AnttiHaapala Yes, in the question what I meant was that I wanted to perform an atomic operation on the non-volatile variable in the sense that it must always happen before or after interrupts modify the variable. I updated my question to be clearer. – tlongeri Dec 13 '18 at 20:57
  • @AnttiHaapala Usually alignment guarantees that simple assembly operations (loads, stores) are atomic, that there is no way to generate code for these operations that isn't atomic that isn't less efficient, no compiler would generate non atomic load/store operations; and the ABI required alignement for anything that is visible by separately compiled code. – curiousguy Dec 13 '18 at 21:18
  • @EugeneSh. "_There won't be a guarantee for my_var to even exist_" Are you saying that an object used in an async signal handler isn't guaranteed to exist? That doesn't sound right. – curiousguy Dec 13 '18 at 21:22
  • @curiousguy The compiler has no idea about handlers, it is just seeing functions not being called. If you want to tell it that something is needed which it doesn't see, you use `volatile`. – Eugene Sh. Dec 13 '18 at 21:26
  • @EugeneSh. Wrong. Tell me how the system knows which function to call asynchronously. – curiousguy Dec 13 '18 at 21:27
  • I looked into the assembly for the library I was using, for a Cortex-M0, and the interrupt disable for GCC looks like `__asm volatile ("cpsie i" : : : "memory");`, so I'm informing myself of what the volatile keyword and `"memory"` mean in this context. – tlongeri Dec 13 '18 at 21:32
  • @curiousguy System? Or compiler? These are two very different things. The compiler(+linker) is placing the function in some place in the memory, and a branch instruction on the corresponding interrupt vector, to jump to this function. It's job is done. Again, function and branch instruction. No notion of handler whatsoever. From here it's up to the hardware. Also note, there is *no* notion of interrupts or handlers in C standard. (except the very limited paragraph 5.2.3) – Eugene Sh. Dec 13 '18 at 21:35
  • @TAL33 `"memory"` is *not* well specified but intuitively it means calling a separately compiled function that can access any globally accessible object; it cannot access local variable not shared with global state. – curiousguy Dec 13 '18 at 21:45
  • @EugeneSh. So the compiler has placed the address of the function in some "vector". It probably will *not* optimize out the function or anything accessed in the function. – curiousguy Dec 13 '18 at 21:51
  • @curiousguy It will, unless you tell it not to. Don't forget that the interrupt handler can have nested function calls as well. How can compiler know it should not optimize these too? – Eugene Sh. Dec 13 '18 at 21:53
  • @EugeneSh. How can the compiler know that it can optimize away anything? Does the compiler somehow believes that the "interrupt vector" is some useless global variable that is never used for any purpose and that if an address is written to a "vector" is it never consumed for any purpose? – curiousguy Dec 13 '18 at 21:54
  • @curiousguy Read the C standard section [5.1.2.3](http://port70.net/~nsz/c/c11/n1570.html#5.1.2.3) and specifically [p4](http://port70.net/~nsz/c/c11/n1570.html#5.1.2.3p4) and [p6](http://port70.net/~nsz/c/c11/n1570.html#5.1.2.3p6) – Eugene Sh. Dec 13 '18 at 22:00
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/185228/discussion-between-curiousguy-and-eugene-sh). – curiousguy Dec 14 '18 at 00:25

3 Answers3

1

C/C++ volatile has a very narrow range of guarantee uses: to interact with the outside world directly (signal handler written in C/C++ are "outside" when they are called asynchronously); that's why volatile object accesses are defined as observables, just like the console I/O and the exit value of the program (return value of main).

A way to see it is to imagine that any volatile access is actually translated by I/O on a special console, or terminal or pair of FIFO devices named Accesses and Values where:

  • a volatile write x = v; to object x of type T is translated to writing to the FIFO Accesses a write order specified as a 4-uplet ("write", T, &x, v)
  • a volatile read (lvalue to rvalue conversion) of x is translated to writing to Accesses a 3-uplet ("read", T, &x) and waiting for the value on Values.

This way, volatile is exactly like an interactive console.

A nice specification of volatile is the ptrace semantic (that nobody but me uses, but it's still the nicest volatile specification ever):

  • a volatile variable can be examined by the debugger/ptrace after the program has been stopped at a well defined point;
  • any volatile object access is a set of well defined PC (program counter) points such that a breakpoint can be set there(**): an expression doing a volatile access translates to a set of addresses in the code where breaking causes a break at a defined C/C++ expression;
  • the state of any volatile object can be modified in arbitrary ways(*) with ptrace when the program is stopped, limited only to the legal values of the object in C/C++; changing the bit pattern of a volatile object with ptrace is equivalent with adding an assignment expression in the C/C++ at the C/C++ well defined breakpoint, so it's equivalent with changing C/C++ source code at run time.

It means that you have a well defined ptrace observable state of the volatile objects at these points, period.

(*) But you may not set a volatile object to an invalid bit pattern with ptrace: the compiler can assume that any object has a legal bit pattern as defined by the ABI. All uses of ptrace to access volatile state must follow the ABI specification of objects shared with separately compiled code. For example a compiler can assume that a volatile number object doesn't have a negative zero value if the ABI doesn't allow it. (Obviously negative zero is a valid state, semantically distinct from positive zero, for IEEE floats.)

(**) Inlining and loop unrolling can generate many points in assembly/binary code corresponding to a unique C/C++ point; debuggers handle that by setting many PC level breakpoints for one source level breakpoint.

ptrace semantic doesn't even imply that a volatile local variable is stored on the stack and not in register; it implies that the location of the variable, as described in the debugging data, is modifiable either in addressable memory via its stable address in the stack (stable for the duration of the function call obviously) or in the representation of the saved registers of a paused program, which is in temporary complete copy of the registers as saved by the scheduler when a thread of execution is paused.

[In practice all compilers provide a stronger guarantee than ptrace semantic: that all volatile objects have a stable address even if their address is never taken in C/C++ code; this guarantee is sometimes not useful and strictly pessimistic. The lighter ptrace semantic guarantee is extremely useful in itself for automatic variable in register in "high level assembly".]

You can't examine a running program (or thread) without stopping it; you cannot observe from any CPU without synchronization (ptrace provides such synchronization).

These guarantees hold at any optimization level. At minimum optimization, all variables are in fact practically volatile and the program can be stopped at any expression.

At higher optimization level, computations are reduced and variables can even be optimized out if they hold no useful information for any legal run; the most obvious case is a "quasi const" variable, which isn't declared const, but used a-if const: set once and never changed. Such variable carries no information at runtime if the expression that was used to set it can be recomputed later.

Many variables that carry useful information still have a limited range: if there is no expression in a program that can set a signed integer type to a mathematical negative result (a result that is truly negative, not negative because of overflow in 2-complement system), the compiler can assume that they don't have negative values. Any attempt to set these to a negative value in the debugger or via ptrace would be unsupported as the compiler can generate code that integrate the assumption; making the object volatile would force the compiler to allow any possible legal value for the object, even if only assignments of positive values are present in the complete code (the code in all paths that can access that object, in every TU (translation unit) that can access the object).

Note that for any object that is shared beyond the set of collectively translated code (all TU that are compiled and optimized together), nothing about the possible values of the object can be assumed beside the applicable ABI.

The trap (not trap as in computing) is to expect Java volatile-like semantic in at least single CPU, linear, ordered semantic programming (where there is by definition no out of order execution as there is only of POV on the state, the one and only CPU):

int *volatile p = 0;
p = new int(1);

There is no volatile guarantee that p can only be null or point to an object with value 1: there is no volatile ordering implied between the initialization of the int and the setting of the volatile object, so an async signal handler or a breakpoint on the volatile assignment may not see the int initialized.

But the volatile pointer may not be modified speculatively: until the compiler obtains the guarantee that the rhs (right hand side) expression will not throw an exception (thus leave p untouched), it cannot modify the volatile object (as a volatile access is an observable by definition).

Going back to your code:

INTENABLE = 0; // volatile write (A)
my_var += 5;  // normal write
INTENABLE = 1; // volatile write (B)

Here INTENABLE is volatile so all accesses are observable; the compiler must produce exactly those side effects; the normal writes are internal to the abstract machine and the compiler need only to preserve these side effects WRT to producing the correct result, without accounting for any signals which are outside the abstract semantics of C/C++.

In term of ptrace semantics, you can set a breakpoint at point (A) and (B) and observe or change the value of INTENABLE but that's all. Although my_var may not be optimized out completely as it accessible by outside code (the signal handing code) but there is nothing else in that function that can access it, so the concrete representation of my_var doesn't have to match its the value according to the abstract machine at that point.

It's different if you have call to an truly external (not analyzable by the compiler, outside the "collectively translated code") do-nothing function in between:

INTENABLE = 0; // volatile write (A)
external_func_1(); // actual NOP be can access my_var 
my_var += 5;  // normal write
external_func_2(); // actual NOP be can access my_var 
INTENABLE = 1; // volatile write (B)

Note that both of these calls to do-nothing-possibly-do-anything external functions are needed:

  • external_func_1() possibly observes the previous value of my_var
  • external_func_2() possibly observes the new value of my_var

These calls are to external, separately compiled NOP functions that have to be made according to the ABI; thus all globally accessible objects must carry the ABI representation of their abstract machine value: the objects must reach their canonical state, unlike the optimized state where the optimizer knows that some concrete memory representation of some objects have not reached the value of the abstract machine.

In GCC such do-nothing external function can be spelled either asm("" : : : "memory"); or just asm("");. The "memory" is vaguely specified but clearly means "accesses anything in memory whose address has been leaked globally".

[See here I'm relying on the transparent intent of the specification and not on its words as the words are very often badly chosen(#) and not used by anyone to build an implementation anyway, and only the opinion of people count, the words never do.

(#) at least in the world of common programming languages where people don't have the qualification to write formal or even correct specifications. ]

curiousguy
  • 8,038
  • 2
  • 40
  • 58
0

Without interrupts, I think you're safe from the scheduler switching away and something changing your variable behind your back. But down to the nitty-gritty, that probably depends on the computer architecture. It's true for the typical x86.

An additional gotcha with non-volatile variables is that the compiler will optimize away variable reads if it thinks there's no way it could change, which will happen with or without interrupts in that section. But unless the variable is volatile in nature, like an input pin, that "shouldn't" break the critical section.

Short answer: Being in a critical section won't save your non-volatile variable from the optimizer.

Philip
  • 1,539
  • 14
  • 23
  • This is tagged embedded and the OP speaks of microcontrollers. Bringing up x86 is off-topic. What "scheduler"? – Lundin Dec 13 '18 at 18:22
  • Exactly. Which is why it would depend on architecture. And/or which RTOS you're using. If you have one. Most handle concurrent processing via interrupts. "most". – Philip Dec 13 '18 at 19:03
0

There's several things of concern here.

Instruction re-ordering

Regarding instruction re-ordering as part of optimizations, the compiler is not allowed to do that across volatile variable access. A volatile variable is evaluated "strictly according to the rules of the abstract machine", meaning in practice that at the sequence point at the end of the volatile access expression, everything before that expression must be evaluated.

In this regard, inline assembler can most likely be regarded as safe from re-ordering too. Any compiler re-ordering or optimizing away manually written assembler is broken and unsuitable for embedded systems programming.

This means that if the interrupt enable/disable in your example boils down to setting/clearing the global interrupt mask, as some form of inline assembler macro, then the compiler can't very well re-order it. If it's an access to a hardware register, then that will (hopefully) be volatile-qualified and can't be re-ordered either.

This means that stuff between the inline assembler instructions/volatile accesses is safe from re-ordering in relation to the inline assembler/volatile access, but not in relation to anything else.

Optimizing away variables shared with ISRs/with no visible side effects

This is mostly answered here. In your specific example, my_var has no notable side-effects and may be optimized away. Same goes if it is modified from an interrupt. This is the greater danger here, as the inline asm/volatile accesses surrounding the non-volatile variable access don't matter the slightest.

With a "spaghetti globals"/external linkage design, the compiler might indeed be blocked from making various assumptions when optimizing. I'm not entirely sure what link-time optimization of gcc would mean here, but if you tell the linker not to worry about other translation units spaghetti-accessing, then indeed I think bad things might happen. Not because of re-ordering, but because of general "no side-effect" optimization. Although arguably, this is the least of your worries if you spew extern all over the program.


If you don't have optimizations enabled, then you are fairly safe. If you have, then generally embedded system compilers are quite forgiving and don't do too aggressive optimizations. gcc is another story though, and is keen to cause havoc in embedded software at -O2 or -O3, particularly when your code contains some manner of poorly-specified behavior.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • After doing some more research I think you are wrong in the case of volatile accesses. [This link](https://www.embedded.com/print/4442490) and [this one](https://www.embedded.com/electronics-blogs/break-points/4027634/Disabling-Interrupts) say that non-volatile accesses can be reordered around volatile accesses. I ended up making a small program myself that shows reordering around volatile variables with GCC and `-O2`: I will add it to the question as an edit. – tlongeri Dec 13 '18 at 20:30
  • "_Regarding instruction re-ordering as part of optimizations, the compiler is not allowed to do that across volatile variable access._" That statement is **not even wrong**: in term of abstract C/C++ semantics, it is meaningless. A volatile write has no relation with the virtual machine: the state of non volatile objects isn't even observable at that point. – curiousguy Dec 13 '18 at 21:26
  • @curiousguy C17 5.1.2.3 §6 Definition of observable behavior: "Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine." Which is earlier defined in §4 as "In the abstract machine, all expressions are evaluated as specified by the semantics". So I'm not quite sure what you mean. – Lundin Dec 14 '18 at 15:25
  • @Lundin 1) The volatile operations are done concretely exactly as described by the abstract machine. 2) The non volatile operations are not observable. So a) In term of abstract machine, all statements are executed according to the ... abstract machine. There is no "reordering". b) In the real program code, only volatile operations I/O and `exit` are observable. There is no "reordering" of non volatile operations, they don't even have to happen. **There are not observable; they are only a mean to an end: the observable operations.** – curiousguy Dec 14 '18 at 20:45
  • **The abstract machine is a math tool** to describe semantics, it doesn't exist. I know the std text is very confusing so it's probably better to forget it and let its intent explain itself. The intent of the C/C++ spec is to 1) allow the implementation any transformation that preserves program behavior, as any high level spec 2) allow low level programming with precise control of operations that communicate with other component (hardware, or software components), as any low level language needs to. So the spec allows any transformation that preserves the trace of volatile ops. – curiousguy Dec 14 '18 at 20:51
  • The presence of a volatile operation somewhere hopefully doesn't prevent all transformations around it. But the way volatile is specified, a volatile write doesn't even have "release" semantics on global variables (or writes to shared memory), even in the single CPU configuration. Getting the correct intuition for volatile isn't easy. I used to misunderstand it too. – curiousguy Dec 14 '18 at 21:08
  • @curiousguy "evaluated as specified by the semantics" means that all parts of the standard labelled "semantics" (most of it) are normative and cannot be ignored. This includes sequence points. When they appear in an expression containing a volatile access are to behave as sequence points. All previous evaluations have to be done before the sequence point. This means that _the result_ of non-volatile operations is not allowed to be wildly moved across that sequence point. – Lundin Dec 16 '18 at 12:49
  • It's not easy to interpret the standard here. In addition compiler vendors like to misinterpret it, because they focus way too much on optimizations and not enough on program safety. And so pretty much all compilers have been found non-compliant in their implementation of volatile over the years, resulting in updates and fixes. – Lundin Dec 16 '18 at 12:52
  • @Lundin "_All previous evaluations have to be done before the sequence point_" What would that mean? Would that apply to non volatile local variables? – curiousguy Dec 16 '18 at 17:47
  • @curiousguy C17 5.1.2.3 §3 "The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B." If B is an expression with volatile access, then it has to be evaluated strictly to these rules. Meaning that A is guaranteed to be sequenced before B, no matter if A contains a volatile access or not. – Lundin Dec 17 '18 at 07:29
  • @Lundin It's sequenced but what does that mean concretely? – curiousguy Dec 17 '18 at 08:04
  • @curiousguy What I just wrote. – Lundin Dec 17 '18 at 08:45
  • @Lundin So concretely it means nothing. The abstract machine is not observable. – curiousguy Dec 17 '18 at 09:20
  • @curiousguy No, concretely it means "If B is an expression with volatile access, then it has to be evaluated strictly to these rules. Meaning that A is guaranteed to be sequenced before B, no matter if A contains a volatile access or not". – Lundin Dec 17 '18 at 09:26
  • @Lundin How is that "concrete"? Which variable are observable? – curiousguy Dec 17 '18 at 18:29