why does read barrier can causes all effects prior to storage from another cpu be perceptible?

Question

Documentation about memory barriers of linux kernel(https://www.kernel.org/doc/Documentation/memory-barriers.txt) has this example to illustrate the read barrier in the SMP case can be used to causes all effects prior to storage from another cpu be perceptible. why dose read barrier can do that?

+-------+       :      :                :       :
|       |       +------+                +-------+
|       |------>| A=1  |------      --->| A->0  |
|       |       +------+      \         +-------+
| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
|       |       +------+        |       +-------+
|       |------>| B=2  |---     |       :       :
|       |       +------+   \    |       :       :       +-------+
+-------+       :      :    \   |       +-------+       |       |
                             ---------->| B->2  |------>|       |
                                |       +-------+       | CPU 2 |
                                |       :       :       |       |
                                |       :       :       |       |
  At this point the read ---->   \  rrrrrrrrrrrrrrrrr   |       |
  barrier causes all effects      \     +-------+       |       |
  prior to the storage of B        ---->| A->1  |------>|       |
  to be perceptible to CPU 2            +-------+       |       |
                                        :       :       +-------+

Because cache is coherent, and this is release/acquire synchronization. https://preshing.com/20120913/acquire-and-release-semantics/ and https://preshing.com/20130922/acquire-and-release-fences/. Not sure which piece of the answer you're missing, so IDK which part to write an answer about. — Peter Cordes, Jan 20 '22 at 15:36
I don't think those URLs are helpful. [Pershing memory order](https://preshing.com/20120930/weak-vs-strong-memory-models/) or better [Wikipedia's runtime memory ordering](https://en.wikipedia.org/wiki/Memory_ordering#Runtime_memory_ordering). On some CPUs, the 'load/store' unit is a separate logical system. Writes/reads to similar location are more efficient. As such, it may re-order the reads/writes from **program** order to increase throughput. In some cases, this matters and you need a runtime barrier (versus a compiler barrier). Often the case with memory mapped hardware devices. — artless noise, Jan 20 '22 at 22:03
There is also what is *atomic*. Ie, if a value is too big, it can take several instructions to write/read it. For instance, an 8bit CPU (AVR) might need four writes to update a 32bit value. If the value is `0xff` and we add one, you might see `0x1ff` or `0x00` for a brief moment instead of `0x100`. The size of *atomic* values is also often important... Other considerations are write-back/write-thru cache, write buffers, per CPU-L1 with Unified L2, etc. — artless noise, Jan 20 '22 at 22:17
Asking **why** a barrier makes memory effects visible is like asking why `printf` prints things, or why the `mul` instruction multiplies. Because that's its job! If your question is **how** it does that, please clarify, but also explain what level of detail and/or sophistication you are looking for. — Nate Eldredge, Jan 21 '22 at 05:19

why does read barrier can causes all effects prior to storage from another cpu be perceptible?

0 Answers0