3

Intel software manual says clwb "Writes back to memory the cache line (if modified) that contains the linear address specified with the memory operand from any level of the cache hierarchy in the cache coherence domain. The line may be retained in the cache hierarchy in non-modified state. clwb is ordered with respect to older writes to the cache line being written back"

My question is, in the below pseudo code

write(A)
clwb (A)

Does clwb take care of the write in store buffer? or Do I need to sfence after a write, before using clwb, like

write (A)
sfence
clwb (A)

I want to know whether the "sfence" is actually required or not? Thanks

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Arun Kp
  • 392
  • 2
  • 13
  • 3
    I'm pretty sure you don't need `sfence` before `clflush`, `clflushopt`, or `clwb`. If there was a problem with the store buffer, note that `sfence` can retire from the OoO back-end before the store buffer is actually drained; if hardware didn't catch the dependency at all you might need `mfence`, which would actually prevent `clwb` from *executing* before the store data was committed to L1d cache. But like I said, I'm pretty sure you don't need anything. – Peter Cordes Sep 05 '20 at 08:56
  • 3
    The last sentence you quoted appears differently in my copy: “CLWB is implicitly ordered with older stores executed by the logical processor to the same address.” I think that clearly answers your question, doesn’t it? (I wonder which is newer. Mine is rev 70, May 2019, which is outdated.) – prl Sep 05 '20 at 09:19
  • @PeterCordes Thanks a lot for your reply. have a great day – Arun Kp Sep 05 '20 at 17:27
  • @prl thanks a lot, I was referring to "Order Number: 325384-070US May 2019" – Arun Kp Sep 05 '20 at 17:28
  • 1
    The paragraph you quoted is not in the Intel SDM revision 70. Where did you quote it from? – prl Sep 05 '20 at 18:08

2 Answers2

4

On Intel processors, the clwb instruction is ordered with respect to older writes to the same cache line. On AMD processors, according to Section 7.6.3 of Volume 2 of the AMD manual No. 24593, the clwb instruction is ordered with respect to older writes to the same cache line if the memory type of the target address is a cacheable memory type (i.e., WB, WT, or WP) at the time of executing the clwb instruction.

This ordering guarantee means that the most recent state of the line or a later state with respect to program order will eventually be written back if necessary to the persistence domain at some point in time after retiring the clwb instruction. Note that the persistence domain is defined by the platform.

Hadi Brais
  • 22,259
  • 3
  • 54
  • 95
  • Thanks, Does it mean, If I have a single thread of execution, then the correctness of operations "store A, clwb (A), store B, clwb (B)" are maintained without use of sfence on Intel X86-64 , as TSO ensured store(A) to store(B) are ordered, and clwb(A) is ordered with store(A) and clwb(B) is ordered with store(B) – Arun Kp May 17 '21 at 08:01
1

Here is my answer to the follow-up question: Does it mean, If I have a single thread of execution, then the correctness of operations "store A, clwb (A), store B, clwb (B)" are maintained without use of sfence on Intel X86-64 , as TSO ensured store(A) to store(B) are ordered, and clwb(A) is ordered with store(A) and clwb(B) is ordered with store(B)

clwb instructions are not ordered with each other if they flush different cache lines. TSO only guarantees that stores retire in program order (i.e., writing to cache in program order). So in your example, at the cache hierarchy, store A always completes before store B, but store B could reach memory (either volatile or non-volatile) before store A. If you only want to keep the write-back order at the cache hierarchy, no sfence is required.

But if you need to guarantee that store A always reaches the memory before store B, you need to insert a sfence between clwb(A) and store(B).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
DDelphine
  • 11
  • 4