Cache line sharing between processors. False sharing. Data race condition?

Question

Trying to simulate data race for code:

struct Foo
{
  // ~ 3100ms
  alignas(std::hardware_destructive_interference_size) std::atomic<int> Counter1 = 0;
  alignas(std::hardware_destructive_interference_size) std::atomic<int> Counter2 = 0;

  // ~ 5900ms
  //std::atomic<int> Counter1 = 0;
  //std::atomic<int> Counter2 = 0;

  // ~ 130ms
  //alignas(std::hardware_destructive_interference_size) int Counter1 = 0;
  //alignas(std::hardware_destructive_interference_size) int Counter2 = 0;

  // ~ 270ms
  //int Counter1 = 0;
  //int Counter2 = 0;
};

int main()
{
  Foo       FooObj;
  const int IncrementsByThread = 100000000;

  std::thread T1([&FooObj, IncrementsByThread](){
      for (int i = 0; i < IncrementsByThread; i++)
        FooObj.Counter1++;
    });

  std::thread T2([&FooObj, IncrementsByThread](){
      for (int i = 0; i < IncrementsByThread; i++)
        FooObj.Counter2++;
    });

  T1.join();
  T2.join();

  std::cout << "Counters are " << FooObj.Counter1 << ", " << FooObj.Counter2 << std::endl;

  return 0;
}

Result always the same, counters are equal, there are no data races. But false sharing exists without aligned data.

Not aligned Counter1 and Counter2 placed next to each other in memory, so cache line holds both of them. Have false sharing. After alignas() false sharing fixed.

But I thought there would be data race in case simple int Counters without alignas(). Looks like we have cache coherence working at all 4 cases with atomics, simple int, alignas()?

When do cache coherence protocols works? Thought only in cases atomic operations

Tryied to use atomics, alignas() to avoid false sharing. One int Counter makes data races, it is obviously, but I was thinking that simple non-atomic counters nearly placed in memory holding 1 cache line can produce data race situation and will be Counter1 != Counter2.

false sharing doesn't change results, it just affects speed. It is not a data race. A data race is when two or more threads modify the same variable without synchronization between those threads. — NathanOliver, Jun 29 '23 at 11:56
The result also depends on the CPU hardware. Some systems will allow CPUs to have unsynced copies of the cache line. Intel x86 will not, it always makes the caches coherent. — BoP, Jun 29 '23 at 12:00
Yes, I Understand that false sharing has no deal with data races. I think I could show that we have false sharing at this code example, so Counter1 and Counter2 are stored in one cache line. Why there is no data race? Because of cache coherense protocols? When do these protocols start working? I was thinking that coherence fixes only atomic operations — Alexey Usachov, Jun 29 '23 at 12:01
*"I was thinking that coherence fixes only atomic operations"* Atomic operations are guaranteed to be coherent, but nothing stops other operations from being that anyway. — BoP, Jun 29 '23 at 12:03
By the way, just tried on xcode this example with apple clang - result is the same, Counter1 == Counter2. Coherence works also on mac in this case. First time I used MSVC on Windows. — Alexey Usachov, Jun 29 '23 at 12:16
There is no data race because you are modifying to separate variables. The CPU ensures that you get the correct behavior with cache coherence protocols — NathanOliver, Jun 29 '23 at 12:19
Please, note what I got when I compiled `for (int i = 0; i < IncrementsByThread; i++) FooObj.Counter1++;` with `int Counter1` uncommented: `mov rax, QWORD PTR [rdi+8] add DWORD PTR [rax+4], 100000000` FYI: [demo on Compiler Explorer](https://godbolt.org/z/v7Eo53156) Hence, it might not really be incrementing what you're measuring. — Scheff's Cat, Jun 29 '23 at 14:42
@Scheff'sCat, yeah. Having good optimization) But also in case of my time measurements with 270 milliseconds on my hardware, doesn't look like having same case as to add two values to memory. May be start and join threads takes so long...but not sure) — Alexey Usachov, Jun 30 '23 at 05:13
Tried disassembly this piece of code on msvc, looks like real increment for simple int: `FooObj.Counter1++; 00CE8177 mov eax,dword ptr [this] 00CE817A mov ecx,dword ptr [eax] 00CE817C mov edx,dword ptr [ecx] 00CE817E add edx,1` — Alexey Usachov, Jun 30 '23 at 05:29
@AlexeyUsachov No, I found this in the MSVC code as well. (You can find it in the link I provided in my previous comment.) Please, pay attention to the optimization level. Try `/O2` (MSVC) `-O2` (gcc). (When you measure timings, optimization should be enabled always.) — Scheff's Cat, Jun 30 '23 at 09:02
@Scheff'sCat thanks, you are right. Optimization can cause this difference. — Alexey Usachov, Jun 30 '23 at 10:13

Cache line sharing between processors. False sharing. Data race condition?

0 Answers0