Memory barriers: A hardware view for software hackers - invalidate queues

Question

Even though Memory barriers: a hardware view for software hackers book is considered extremely old (by it's author, seems like Paul himself answered this question) I find it as an excellent helper to build a mental model around memory ordering.

There is a little thing though that I don't understand:

Let's consider the page with a memory barrier:

Step 4 states that "b=1" is written to a store buffer because "a=1" is not written to the cache yet.

The thing that I can't get is why on the next page:

on step 3 "b=1" is written to the cache line, even though there is a memory barrier after "a=1" and "a=1" is not yet written to the cache? Following the previous page reasoning "b=1" should be written to the cache only after (or within) step 10, when a store buffer, containing "a=1" is written to the cache.

Pages 9 and 11 from PDF you linked are slightly different from the ones included in the question. In particular, step 3 of Section 4.3 is different. — Hadi Brais, Jul 12 '18 at 09:37
Hmm.., interesting. thank you for pointing out. Indeed I copied the link from the previous question, but had a copy of the book apparently of an old revision. I will reread the correct one. Thanks. — Artem Konovalenkov, Jul 12 '18 at 12:00

score 1 · Accepted Answer · answered Jul 12 '18 at 10:22

The pdf that you posted is different from the screenshot in your question, so I am presuming the old version was incorrect (or at least not precise enough).

Chapter 4.3. actually starts with the following remark:

Let us suppose that CPUs queue invalidation requests, but respond to them immediately. This approach minimizes the cache-invalidation latency seen by CPUs doing stores, but can defeat memory barriers, as seen in the following example.

The sequence is also a bit different than what you posted:

CPU 0 executes a=1. The corresponding cache line is read-only in CPU 0’s cache, so CPU 0 places the new value of "a" in its store buffer and transmits an "invalidate" message in order to flush the corresponding cache line from CPU 1's cache.
CPU 1 executes while (b==0) continue;, but the cache line containing "b" is not in its cache. It therefore transmits a "read" message.
CPU 1 receives CPU 0's "invalidate" message, queues it, and immediately responds to it.
CPU 0 receives the response from CPU 1, and is therefore free to proceed past the smp_mb() on line 4 above, moving the value of "a" from its store buffer to its cache line.

I believe this is a hypothetical scenario, but when you take this into account, the obviously problematic part is CPU 1 acknowledging an "invalidate" message before actually invalidating its cache, which makes CPU 0 think it can proceed.

Memory barriers: A hardware view for software hackers - invalidate queues

1 Answers1