C99 "atomic" load in baremetal portable library

Question

I'm working on a portable library for baremetal embedded applications.

Assume that I have a timer ISR that increments a counter and, in the main loop, this counter read is from in a most certainly not atomic load.

I'm trying to ensure load consistency (i.e. that I'm not reading garbage because the load was interrupted and the value changed) without resorting to disabling interrupts. It does not matter if the value changed after reading the counter as long as the read value is proper. Does this do the trick?

uint32_t read(volatile uint32_t *var){
    uint32_t value;
    do { value = *var; } while(value != *var);
    return value;
}

for atomicity maybe is better to have a lock, eg a global flag, that when set the read operation will loop untill unset. The lock is set by the interrupt handler — Nikos M., Jun 16 '20 at 15:53
@NikosM. This won't fit in the library purpose. Basically, I'm trying to implement condition variables in plain C99. — André Medeiros, Jun 16 '20 at 15:56
I don't think it would work. Isn't the conditional also loading the value in order to check it? meaning it is subject to any issues that the actual load is subject to? — Chris Rollins, Jun 16 '20 at 16:45
@ChrisRollins The idea is that if I can read the same value twice in a row, it is because the load was not interrupted (or it was interrupted but its value hasn't changed) — André Medeiros, Jun 16 '20 at 17:02
Is the machine actually single-processor? Of course the interrupt could fire at any time, but are you guaranteed that whenever it does fire, the ISR runs to completion before the main thread of execution gets control back? — Nate Eldredge, Jun 16 '20 at 17:03
@NateEldredge The library must be portable, so I am not assuming any specific architecture. On the second question, it is sort of guaranteed . Synchronising stores from threads/nested ISRs is responsibility of the client code, but the library will only ever load the value from the main loop. — André Medeiros, Jun 16 '20 at 17:06
So, for instance, is it possible that the ISR runs on a different CPU in parallel with the main loop, and that the store in the ISR requires multiple instructions, and they get interleaved with the main loop's load instructions? — Nate Eldredge, Jun 16 '20 at 17:27
Basically I think it's hopeless to try to do this in a completely portable manner; one can always imagine some bizarre architecture for which it would fail. — Nate Eldredge, Jun 16 '20 at 17:28
C99 cannot answer this question because you cannot even ask it using C99 vocabulary. — n. m. could be an AI, Jun 16 '20 at 21:06

Kuba hasn't forgotten Monica · Answer 1 · 2020-06-17T01:07:48.477

It's highly unlikely that there's any sort of a portable solution for this, not least because plenty of C-only platforms are really C-only and use one-off compilers, i.e. nothing mainstream and modern-standards-compliant like gcc or clang. So if you're truly targeting entrenched C, then it's all quite platform-specific and not portable - to the point where "C99" support is a lost cause. The best you can expect for portable C code is ANSI C support - referring to the very first non-draft C standard published by ANSI. That is still, unfortunately, the common denominator - that major vendors get away with. I mean: Zilog somehow gets away with it, even if they are now but a division of Littelfuse, formerly a division of IXYS Semiconductor that Littelfuse had acquired.

For example, here are some compilers where there's only a platform-specific way of doing it:

Zilog eZ8 using a "recent" Zilog C compiler (anything 20 years old or less is OK): 8-bit value read-modify-write is atomic. 16-bit operations where the compiler generates word-aligned word instructions like LDWX, INCW, DECW are atomic as well. If the read-modify-write otherwise fits into 3 instructions or less, you'd prepend the operation with asm("\tATM");. Otherwise, you'd need to disable the interrupts: asm("\tPUSHF\n\tDI");, and subsequently re-enable them: asm("\tPOPF");.
Zilog ZNEO is a 16 bit platform with 32-bit registers, and read-modify-write accesses on registers are atomic, but memory read-modify-write round-trips via a register, usually, and takes 3 instructions - thus prepend the R-M-W operation with asm("\tATM").
Zilog Z80 and eZ80 require wrapping the code in asm("\tDI") and asm("\tEI"), although this is valid only when it's known that the interrupts are always enabled when your code runs. If they may not be enabled, then there's a problem since Z80 does not allow reading the state of IFF1 - the interrupt enable flip-flop. So you'd need to save a "shadow" of its state somewhere, and use that value to conditionally enable interrupts. Unfortunately, eZ80 does not provide an interrupt controller register that would allow access to IEF1 (eZ80 uses the IEFn nomenclature instead of IFFn) - so this architectural oversight is carried over from the venerable Z80 to the "modern" one.

Those aren't necessarily the most popular platforms out there, and many people don't bother with Zilog compilers due to their fairly poor quality (low enough that yours truly had to write an eZ8-targeting compiler*). Yet such odd corners are the mainstay of C-only code bases, and library code has no choice but to accommodate this, if not directly then at least by providing macros that can be redefined with platform-specific magic.

E.g. you could provide empty-by-default macros MYLIB_BEGIN_ATOMIC(vector) and MYLIB_END_ATOMIC(vector) that would be used to wrap code that requires access atomic with respect to a given interrupt vector (or e.g. -1 if with respect to all interrupt vectors). Naturally, replace MYLIB_ with a "namespace" prefix specific to your library.

To enable platform-specific optimizations such as ATM vs DI on "modern" Zilog platforms, an additional argument could be provided to the macro to separate the presumed "short" sequences that the compiler is apt to generate three-instruction sequences for vs. longer ones. Such micro-optimization requires usually an assembly output audit (easily automatable) to verify the assumption of the instruction sequence length, but at least the data to drive the decision would be available, and the user would have a choice of using it or ignoring it.

*^{If some lost soul wants to know anything bordering on the arcane re. eZ8 - ask away. I know entirely too much about that platform, in details so gory that even modern Hollywood CG and SFX would have a hard time reproducing the true depth of the experience on-screen. I'm also possibly the only one out there running the 20MHz eZ8 parts occasionally at 48MHz clock - as sure a sign of demonic possession as the multiverse allows. If you think it's outrageous that such depravity makes it into production hardware - I'm with you. Alas, business case is business case, laws of physics be damned.}

Thanks for the clarifications. By portable I meant a library that can be built by a standard compliant compiler rather than a library that can be built on every platform. I am trying to abstract away platform-specific details tough as I know they can't really be avoided. I guess I'm implementing atomic as a set of macros and leaving the actual implementation to the client. — André Medeiros, Jun 17 '20 at 22:41

Michael Dorgan · Accepted Answer · 2020-06-16T18:48:02.220

Are you running on any systems that have uint32_t larger than a single assembly instruction word read/write size? If not, the IO to memory should be a single instructions and therefore atomic (assuming the bus is also word sized...) You get in trouble when the compiler breaks it up into multiple smaller read/writes. Otherwise, I've always had to resort to DI/EI. You could have the user configure your library such that it has information if atomic instructions or minimum 32-bit word size are available to prevent interrupt twiddling. If you have these guarantees, you don't need to verification code.

To answer the question though, on a system that must split the read/writes, your code is not safe. Imagine a case where you read your value in correctly in the "do" part, but the value gets split during the "while" part check. Further, in an extreme case, this is an infinite loop. For complete safety, you'd need a retry count and error condition to prevent that. The loop case is extreme for sure, but I'd want it just in case. That of course makes the run time longer.

Let's show a failure case for examples - will use 16-bit numbers on a machine that reads 8-bit values at a time to make it easier to follow:

Value to read from memory *var is 0x1234
Read 8-bit 0x12
*var becomes 0x5678
Read 8-bit 0x78 - value is now 0x1278 (invalid)
*var becomes 0x1234
Verification step reads 8-bit 0x12
*var becomes 0x5678
Verification reads 8-bit 0x78

Value confirmed correct 0x1278, but this is an error as *var was only 0x1234 and 0x5678.

Another failure case would be when *var just happens to change at the same frequency as your code is running, which could lead to an infinite loop as each verification fails. Or even if it did break out eventually, this would be a very hard to track performance bug.

I don't see the problem in the case you mention. If the value was read correctly in the `do` part, then nothing bad can happen. If the test in the `while` returns true, for whatever reason, then you return a value which by assumption was correct. If it returns false, for whatever reason, you simply try again and no harm is done. — Nate Eldredge, Jun 16 '20 at 18:29
If the ISR is really only incrementing `*var` by 1, then in some sense the result 0x1278 is not really wrong, as `*var` really did equal 0x1278 at some instant during the execution of our `read()` function. I tried a little to come up with an example where this is not the case, but I couldn't. — Nate Eldredge, Jun 17 '20 at 01:44
That's exactly the type of corner case I couldn't figure out and it is absolutely plausible. The timer was just an example, but this situation actually happens in other places in my lib where the sequential incrementing assumption may not be true. Thanks. — André Medeiros, Jun 17 '20 at 22:43

C99 "atomic" load in baremetal portable library

2 Answers2