Suppose we have a 64-bit global variable, initially zero.
volatile uint64_t gDest = 0;
Store (Atomic) – in one thread. At some point, we atomically increment a 64-bit value of this variable.
AtomicIncrement64 (&gDest);
Here AtomicIncrement64 - can be one of compiler intrinsics or builtins like InterlockedIncrement64 (cl) and __sync_add_and_fetch(GCC)
Load – in another thread
uint64_t a = gDest;
Here if compiler might have implemented the load operation using two machine instructions: The first reads the lower 32 bits into eax, and the second reads the upper 32 bits into edx. In this case, if a concurrent atomic store to gDest becomes visible between the two instructions, will it result in torn read?