1

According to "Data Prefetch to L1 Data Cache" in the Intel 64-ia-32-architectures-optimization-manual (Sept 2019), the PREFETCHNTA instruction works if "Load is from writeback memory type."

My question is whether "writeback memory type" applies to ordinary heap memory?

According to the first answer at Do current x86 architectures support non-temporal loads (from "normal" memory)? (by BeeOnRope), "Yes, recent mainstream Intel CPUs support non-temporal loads on normal memory - but only "indirectly" via non-temporal prefetch instructions, rather than directly using non-temporal load instructions like movntdqa. This is in contrast to non-temporal stores where you can just use the corresponding non-temporal store instructions directly."

I asked a similar question at Can we use non-temporal mov instructions on heap memory? and the answer (by Peter Cordes) was, "You can use NT stores like movntps on normal WB memory (i.e. the heap)." This question is about non-temporal loads (not stores) with PREFETCHNTA.

From what I have read, it looks like PREFETCHNTA works with ordinary heap memory, but I wonder why it's always qualified by "must be writeback memory type."

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
RTC222
  • 2,025
  • 1
  • 20
  • 53

1 Answers1

4

In a user-space process under a mainstream OS, all your memory will be WB (Write Back) cacheable.

Unless you use special system calls to do something like mapping video RAM into your virtual address space. If you aren't doing that, you definitely have write-back memory.

All discussion of other memory types in other answers is just for completeness / to avoid saying things that aren't true in all cases. Or to explain what stuff like SSE4.1 movntdqa NT load is actually for. It's useless on WB memory (on current hardware).

(NT prefetch is very different from NT load.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • MMIO addresses can have side effects and so cannot be prefetched, aside from not being cacheable. –  Jun 13 '20 at 15:54
  • @PaulA.Clayton: Right, but device memory (e.g. video RAM, not MMIO registers) can usefully be mapped WC. I think you *could* map device memory as WB, and use `clflush` all over the place, but you probably wouldn't want to. I think it would be more correct to say that making a page containing MMIO registers cacheable (WT or WB) would be a really bad idea for almost any device, and probably not usable, but the hardware isn't going to stop you from shooting yourself in the foot that way, I think. Unless HW prefetch gets cancelled if it would result in a PCIe access regardless of PAT/MTRR? – Peter Cordes Jun 13 '20 at 16:05
  • 1
    @PeterCordes -- many/most x86 systems will hang if you map MMIO as WB and then try to write something. There are many good reasons for this, some discussed at http://sites.utexas.edu/jdm4372/2013/05/29/notes-on-cached-access-to-memory-mapped-io-regions/ and http://sites.utexas.edu/jdm4372/2013/05/30/coherence-with-cached-memory-mapped-io/ – John D McCalpin Jun 18 '20 at 16:56