2

Even though Memory barriers: a hardware view for software hackers book is considered extremely old (by it's author, seems like Paul himself answered this question) I find it as an excellent helper to build a mental model around memory ordering.

There is a little thing though that I don't understand:

Let's consider the page with a memory barrier:

memory barrier page 1

Step 4 states that "b=1" is written to a store buffer because "a=1" is not written to the cache yet.

The thing that I can't get is why on the next page:

enter image description here

on step 3 "b=1" is written to the cache line, even though there is a memory barrier after "a=1" and "a=1" is not yet written to the cache? Following the previous page reasoning "b=1" should be written to the cache only after (or within) step 10, when a store buffer, containing "a=1" is written to the cache.

Hadi Brais
  • 22,259
  • 3
  • 54
  • 95
  • 1
    Pages 9 and 11 from PDF you linked are slightly different from the ones included in the question. In particular, step 3 of Section 4.3 is different. – Hadi Brais Jul 12 '18 at 09:37
  • Hmm.., interesting. thank you for pointing out. Indeed I copied the link from the previous question, but had a copy of the book apparently of an old revision. I will reread the correct one. Thanks. – Artem Konovalenkov Jul 12 '18 at 12:00
  • You can revise or delete the question accordingly. – Hadi Brais Jul 12 '18 at 12:27

1 Answers1

1

The pdf that you posted is different from the screenshot in your question, so I am presuming the old version was incorrect (or at least not precise enough).

Chapter 4.3. actually starts with the following remark:

Let us suppose that CPUs queue invalidation requests, but respond to them immediately. This approach minimizes the cache-invalidation latency seen by CPUs doing stores, but can defeat memory barriers, as seen in the following example.

The sequence is also a bit different than what you posted:

  1. CPU 0 executes a=1. The corresponding cache line is read-only in CPU 0’s cache, so CPU 0 places the new value of "a" in its store buffer and transmits an "invalidate" message in order to flush the corresponding cache line from CPU 1's cache.

  2. CPU 1 executes while (b==0) continue;, but the cache line containing "b" is not in its cache. It therefore transmits a "read" message.

  3. CPU 1 receives CPU 0's "invalidate" message, queues it, and immediately responds to it.

  4. CPU 0 receives the response from CPU 1, and is therefore free to proceed past the smp_mb() on line 4 above, moving the value of "a" from its store buffer to its cache line.

I believe this is a hypothetical scenario, but when you take this into account, the obviously problematic part is CPU 1 acknowledging an "invalidate" message before actually invalidating its cache, which makes CPU 0 think it can proceed.

vgru
  • 49,838
  • 16
  • 120
  • 201