This question does not assume any specific architecture. Assume that we have a multicore processor with cache coherence, out-of-order execution, and branch prediction logic. We also assume that stores to memory are strictly in program order.
We have two threads running in parallel, each on a separate core.
Below are the threads’ pseudo-code. data
and flag
are initially 0.
Thread #1 code:
data=10;
flag=1;
Thread #2 code:
while(!flag);
print data;
With proper synchronization, Thread #2 would eventually print 1. However, the branch predictor could potentially predict that the loop is not entered, thus perform a speculative read of data
, which contains 0 at that time (prior to Thread #1 setting data
). The prediction is correct, i.e. ‘flag’ is eventually set to 1. In this case the print data
instruction can be retired, but it prints the incorrect value of 0.
The question is whether a memory barrier would somehow prevent the speculative read of data
, and cause the cpu to execute the busy wait properly. An alternative solution could be to let the branch predictor do its work, but snoop the writes done by the other core, and in case a write to data
is detected, we can use the ROB to undo the premature read (and its dependent instructions) and then re-execute with the proper data.
Arch-specific answers are also welcome.