Interlocked.Read/Exchange for longs on 64-bit architectures

Question

Is Interlocked.Read(ref long) "optimized away" on 64-bit architectures? I.e. if I am writing a library which could be used by both architectures, should I be concerned about performance impact of using Interlocked.Read unnecessarily on 64-bit CPUs?

I thought about using something like this, so I am wondering if this makes sense:

    // X64 is a preprocessor constant set for x64 builds  

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static long Read(ref long address)
    {
#if X64
        // atomic on 64-bit processors
        return address;
#else
        // if I got it right, this creates a full memory barrier
        return Interlocked.Read(ref address);
#endif
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static void Write(ref long address, long value)
    {
#if X64
        // atomic on 64-bit processors
        address = value;
#else
        // if I got it right, this creates a full memory barrier
        Interlocked.Exchange(ref address, value);
#endif
    }

You should never have to worry about "performance impact of using `Interlocked.Read`" — H H, Jun 13 '16 at 16:59
@Henk: I thought it was "don't optimize too soon", not "don't optimize ever, period". If multiple threads are using this code concurrently, I don't see the need to have a full memory barrier on each access. Furthermore, number of PCs having 32-bit processors is already low enough, and is only going to be lower. — Lou, Jun 13 '16 at 17:06
Of course I was exaggerating but you should really demonstrate this to be the choke point. I would assume Interlocked knows about the needs of the target platform, and only bother with this after profiling points to it. And even though I don't know your code I'd say that's very unlikely. — H H, Jun 13 '16 at 17:16
It is an *intrinsic* to the x64 jitter. The methods disappear completely and you get a single CPU instruction. XCHG for the write, LOCK CMPXCHG for the read. Highly optimized, not free. — Hans Passant, Jun 13 '16 at 17:18
@Hans: thanks! Yes, I went through the code and saw that `Read` is actually `CompareExchange(ref value, 0, 0);`. I presumed it's an intrinsic, but it's the actual memory barrier which bothers me. Of course, not "bothers me" as in "gives me sleepless nights" :). — Lou, Jun 13 '16 at 17:36

Peter Ritchie · Accepted Answer · 2016-06-14T13:41:34.523

Yes, you are concerned unnecessarily of the performance impacts of Interlocked because Interlocked doesn't just perform an atomic action on values, it also ensures that the value is visible to all threads (sequential consistency). Let me explain. On some architectures (some 64-bit architectures included) the value written to a memory location may be cached to improve performance. Simply reading a value may not read the "latest" value written by another thread despite being an atomic operation. Interlocked also performs a memory fence so that any operations prior to the fence have any cached values flushed to actual memory. So, while you might improve performance a minuscule amount, you're also introducing potential race conditions. On architectures where this isn't an issue, Interlocked will not perform the extra work and does the optimization for you.

Unfortunately the documentation for Interlocked is still not quite up to par on these details. See http://www.albahari.com/threading/part4.aspx for more details on the fence involved in Interlocked operations.

score 0 · Answer 2 · answered Jun 13 '16 at 18:51

I can only answer the second question - you shouldn't be concerned with it. All you can do is determine whether reads and writes to the variable need to be thread safe and code accordingly. C# is an abstraction - you're writing for the language, not the processor. The compiler and the .NET framework worry about the processor.

64-bit reads are guaranteed to be atomic on 64-bit processors, but as you said, you're writing for both architectures. If the cost of locking is a significant impediment that you can avoid on a 64-bit architecture then that impediment will be an unsolved problem on a 32-bit architecture.

There's a cost associated with locking, but the greater cost would come from the unpredictable behavior that would come from not coding for thread safety.

Interlocked.Read/Exchange for longs on 64-bit architectures

2 Answers2