Why must use general barrier to guarantees transitivity of cpu?

Question

I recently read transitivity of cpu in memory-barriers and the author emphasize only general barrier can guarantee transitivity. But, I can't understand it very well.For example:

CPU 1                      CPU 2                      CPU 3
=======================    =======================    =======================
{ X = 0, Y = 0 }
STORE X=1                  LOAD X                     STORE Y=1
                           <read barrier>             <general barrier>
                           LOAD Y                     LOAD X

Suppose X in cache of CPU3,and status is modified;Y in cache of CPU2, and status is also modified.

CPU1 shares it's store buffer with CPU2, if we add write barrier before read barrier. (it become a general barrier)

1) CPU1 sets value of X(X=1) in store buffer.

2) CPU2 reads value of X from store buffer(shared store buffer).

3) CPU2 marks X in store buffer (write barrier),and read invalidate queue to ensure no invalidate messages from CPU3(read barrier).

4) CPU2 wants change cache line of X from invalid to modified,so sends invalidate messages to CPU3.

5) CPU3 receives invalidate messages of X,put it in invalidate queue and respond it to CPU2.

6) CPU2 receives respond,then,write X = 1 to memory or cache, and load Y == 0.

...

7) CPU3 will find that it has invalidate message of X in it’s invalidated queue when it execute general barrier,after that, X must be equal 1.

That’s all right,I can understand.However, I read another example from figure 14.3 of perbook ,as below:

thread0(void) {
    A = 1;
    smp_wb();
    B = 1;
}
thread1(void) {
    while (B == 0)
        continue;
    barrier();
    C = 1;
}
thread2(void) {
    while (C == 0)
        continue;
    barrier();
    assert(A == 1);
}

There are some opportunities to fire assert. The author said that change all barrier to smp_mb can fix it in answer of Quick Quiz 14.2.

So,my question is why we need change barrier in thread1 to smp_mb?If thread0 and thread1 runs on CPU0 and CPU1,and them shared a store buffer. Their store buffer will like bleow after thread1 execute Store C = 1.

[A(wb), B, C]

Because thread2(runs on CPU2) also use smp_mb instead of barrier, So it guarantees that A must be 1 if it see C == 1.

I describe all of above in MESI memory coherency protocol.Maybe author means there are another protocols make barrier in thread1 must be instead of smp_mb to guarantees transitivity of cpu?

Can anybody give me a example please?

Maybe it's a mistake think about transitivity in specific protocol. What we must remember is that rmb() or wmb() can't guarantees transitivity of cpu because there are so many different protocols and architectures.

A memory barrier may be required for architectures that implement weak cache coherence models. This would be necessary to guarantee that X and Y's modifications appear externally in the right order. Even on x86, if one processor were to write X and Y, and they were both on the same cache line, then another CPU reading that same line would not be able to know the order in which they were written. — Timothy Miller, Jan 10 '16 at 21:25
Is it means that change cache line status doesn't wait all invalidate respond messages return in weak cache coherence? — user1310866, Jan 11 '16 at 14:08

Why must use general barrier to guarantees transitivity of cpu?

0 Answers0