For the sake of simplicity let’s assume we have exactly 8 threads and a byte array of exactly 8 bytes length. Each thread is assigned a byte from this array – that is, the thread can modify freely the assigned byte and none of other bytes from the array.
Let’s assume as well that the array is aligned on 8 bytes boundary.
At first sight it would be thread safe to let threads modify their (and only their) bytes ad libitum as there is actually no shared data here. But – as I understand – all current Intel and AMD processors running 64 bit Windows can read and write just no less than 8 bytes (64 bits) once. So I suppose when modifying just 1 byte from an aligned block of 8 bytes the CPU reads all 8 bytes, modifies the byte in question and writes back the 1 modified byte together with the 7 unmodified bytes. This is everything but thread-safe so I suspect a LOCK prefix would be necessary when writing these bytes directly.
Though I really hope I’m wrong. Any ideas?