I need CAS functions to use in a context of multiple threads running on the same CPU (assume that all threads are statically glued to selected CPU, via SetThreadAffinityMask
).
InterlockedCompareExchange
generates LOCK CMPXCHG. The LOCK part comes with side effects such as a cache miss, a bus lock and a potential for contention with other CPU, all of which are nice, but feel like an extravagant excess given the circuimstances. Since the competing threads run on the same CPU, I assume the LOCK can be dropped, and I further assume it should result in improved performance.
So this is my first question - do I assume correctly?
--
I know how to generate CMPXCHG with inline assembly for 32-bit version. Also, as per this SO thread I know how to do for 64-bit version too, but as a function call.
What I don't understand, and this is my second question, is how to generate an inlined version of it.
--
Thanks.