I recently read transitivity of cpu in memory-barriers and the author emphasize only general barrier can guarantee transitivity. But, I can't understand it very well.For example:
CPU 1 CPU 2 CPU 3
======================= ======================= =======================
{ X = 0, Y = 0 }
STORE X=1 LOAD X STORE Y=1
<read barrier> <general barrier>
LOAD Y LOAD X
Suppose X in cache of CPU3,and status is modified;Y in cache of CPU2, and status is also modified.
CPU1 shares it's store buffer with CPU2, if we add write barrier before read barrier. (it become a general barrier)
1) CPU1 sets value of X(X=1) in store buffer.
2) CPU2 reads value of X from store buffer(shared store buffer).
3) CPU2 marks X in store buffer (write barrier),and read invalidate queue to ensure no invalidate messages from CPU3(read barrier).
4) CPU2 wants change cache line of X from invalid to modified,so sends invalidate messages to CPU3.
5) CPU3 receives invalidate messages of X,put it in invalidate queue and respond it to CPU2.
6) CPU2 receives respond,then,write X = 1 to memory or cache, and load Y == 0.
...
7) CPU3 will find that it has invalidate message of X in it’s invalidated queue when it execute general barrier,after that, X must be equal 1.
That’s all right,I can understand.However, I read another example from figure 14.3 of perbook ,as below:
thread0(void) {
A = 1;
smp_wb();
B = 1;
}
thread1(void) {
while (B == 0)
continue;
barrier();
C = 1;
}
thread2(void) {
while (C == 0)
continue;
barrier();
assert(A == 1);
}
There are some opportunities to fire assert. The author said that change all barrier to smp_mb can fix it in answer of Quick Quiz 14.2.
So,my question is why we need change barrier in thread1 to smp_mb?If thread0 and thread1 runs on CPU0 and CPU1,and them shared a store buffer. Their store buffer will like bleow after thread1 execute Store C = 1.
[A(wb), B, C]
Because thread2(runs on CPU2) also use smp_mb instead of barrier, So it guarantees that A must be 1 if it see C == 1.
I describe all of above in MESI memory coherency protocol.Maybe author means there are another protocols make barrier in thread1 must be instead of smp_mb to guarantees transitivity of cpu?
Can anybody give me a example please?
Maybe it's a mistake think about transitivity in specific protocol. What we must remember is that rmb() or wmb() can't guarantees transitivity of cpu because there are so many different protocols and architectures.