Interlocked.CompareExchange instruction reodering of the initialvalue

Question

Iam wondering if its possible that the initialvalue in the following code can be reordered to be after the computation resulting in undefined behavior.

The following example is taken from https://learn.microsoft.com/en-us/dotnet/api/system.threading.interlocked.compareexchange?view=netframework-4.8

public class ThreadSafe
{
    // Field totalValue contains a running total that can be updated
    // by multiple threads. It must be protected from unsynchronized 
    // access.
    private float totalValue = 0.0F;

    // The Total property returns the running total.
    public float Total { get { return totalValue; }}

    // AddToTotal safely adds a value to the running total.
    public float AddToTotal(float addend)
    {
        float initialValue, computedValue;
        do
        {
            // Save the current running total in a local variable.
            initialValue = totalValue;
            //Do we need a memory barrier here??
            // Add the new value to the running total.
            computedValue = initialValue + addend;

            // CompareExchange compares totalValue to initialValue. If
            // they are not equal, then another thread has updated the
            // running total since this loop started. CompareExchange
            // does not update totalValue. CompareExchange returns the
            // contents of totalValue, which do not equal initialValue,
            // so the loop executes again.
        }
        while (initialValue != Interlocked.CompareExchange(ref totalValue, 
            computedValue, initialValue));
        // If no other thread updated the running total, then 
        // totalValue and initialValue are equal when CompareExchange
        // compares them, and computedValue is stored in totalValue.
        // CompareExchange returns the value that was in totalValue
        // before the update, which is equal to initialValue, so the 
        // loop ends.

        // The function returns computedValue, not totalValue, because
        // totalValue could be changed by another thread between
        // the time the loop ends and the function returns.
        return computedValue;
    }
}

Is a memory barrier needed between assiging totalvalue to initialvalue and the actual computation?

As I currently understand without a barrier it could be optimized in a way that removes the initialvalue resulting in thread safety issues because the computedValue is could be calculated with a stale value but the CompareExchange will no longer detect this:

    public float AddToTotal(float addend)
    {
        float computedValue;
        do
        {
            // Add the new value to the running total.
            computedValue = totalValue + addend;

            // CompareExchange compares totalValue to initialValue. If
            // they are not equal, then another thread has updated the
            // running total since this loop started. CompareExchange
            // does not update totalValue. CompareExchange returns the
            // contents of totalValue, which do not equal initialValue,
            // so the loop executes again.
        }
        while (totalValue != Interlocked.CompareExchange(ref totalValue, 
            computedValue, totalValue));
        // If no other thread updated the running total, then 
        // totalValue and initialValue are equal when CompareExchange
        // compares them, and computedValue is stored in totalValue.
        // CompareExchange returns the value that was in totalValue
        // before the update, which is equal to initialValue, so the 
        // loop ends.

        // The function returns computedValue, not totalValue, because
        // totalValue could be changed by another thread between
        // the time the loop ends and the function returns.
        return computedValue;
    }

Are there special rules for local variables iam missing here that explains why the example does not use a memory barrier?

The second example (without the snapshot into `initialValue`) is clearly very wrong, as it is reading `totalValue` at multiple points, thus losing any guarantees of sanity; if it was me, I'd be using `initialValue = Volatile.Read(ref totalValue);` in the top one, but ... — Marc Gravell, May 14 '19 at 09:10
Yes the second example is supposed to be wrong. The question is can optimizations result in the same broken code? If so then the example from microsoft is wrong. If this is not possbile iam curious as to why it is correct. — Barsonax, May 14 '19 at 09:12
that's not an "optimization", though - that's very very different code that does *multiple* reads, not just one read that is re-ordered; I can't see a way that the *single read* can be re-ordered past the point where it would impact the CEX — Marc Gravell, May 14 '19 at 09:13
If thats the case a Volatile.Read or any memory barrier is not needed either. If it works with a stale value then the compare exchange would ensure it would retry with a updated value until it succeeds. — Barsonax, May 14 '19 at 09:20
indeed - the `volatile` there was purely to avoid an extra loop in that scenario - it shouldn't change the final outcome, but it might avoid some extra load when heavily contested; but then... one could argue that in many load scenarios, 2 fails in a row are unlikely... (which is the main time I'd expect to see this matter) — Marc Gravell, May 14 '19 at 09:27
I think most of the time it would add a tiny bit of extra overhead since it has to fetch the value from memory even if its already in cache. One only really knows what is faster when its benchmarked with a certain workload though — Barsonax, May 14 '19 at 09:32

Dmytro Mukalov · Accepted Answer · 2019-05-14T10:32:00.147

A CPU never "reorders" instructions in way which may affect logic of a single-threaded execution. In case when

initialValue = totalValue;
computedValue = initialValue + addend;

the second operation is definitely depends on value set in the previous operation. A CPU "understands" this from its single-threaded logic perspective, so this sequence never will be reordered. However the following sequences can be reordered:

initialValue = totalValue;
anotherValue = totalValue;

or

varToInitialize = someVal;
initialized = true;

As you can see the single-core execution won't be affected but on multiple cores this can impose some problems. For example if we build our logic around the fact that if variable initialized is set to true then the varToInitialize should be initialized with some value we can be in trouble on multi-core environment:

if (initialized)
{
    var storageForVal = varToInitialize; // can still be not initalized
    ...
    // do something with storageForVal with assumption that we have correct value
}

As for the local variables. The problem of reordering is problem of global visibility, that is visibility of changes made by one core/CPU to other cores/CPUs. The local variables are mainly tend to be visible only by single thread (apart from some rare scenarios like closures which in the case of exposure outside of a method effectively aren't local variables) so other threads don't have an access to them and thus other cores/CPUs don't require their global visibility. So in other words in vast majority of cases you don't need to worry about local variable operations reordering.

But in a single threaded operation it would be correct to replace initialValue with totalValue right? It only leads to bugs if you consider multiple threads. — Barsonax, May 14 '19 at 10:00
@Barsonax, in this specific case the `initialValue` is only needed to detect whether the new value is successfully stored or not because of some newer value (the same principle as for optimistic locking). And of course in single-thread environment we don't need this additional verification as no other thread is going re-write the target value so we don't need `initialValue` too. — Dmytro Mukalov, May 14 '19 at 10:11

Interlocked.CompareExchange instruction reodering of the initialvalue

1 Answers1