According to "Data Prefetch to L1 Data Cache" in the Intel 64-ia-32-architectures-optimization-manual (Sept 2019), the PREFETCHNTA instruction works if "Load is from writeback memory type."
My question is whether "writeback memory type" applies to ordinary heap memory?
According to the first answer at Do current x86 architectures support non-temporal loads (from "normal" memory)? (by BeeOnRope), "Yes, recent mainstream Intel CPUs support non-temporal loads on normal memory - but only "indirectly" via non-temporal prefetch instructions, rather than directly using non-temporal load instructions like movntdqa. This is in contrast to non-temporal stores where you can just use the corresponding non-temporal store instructions directly."
I asked a similar question at Can we use non-temporal mov instructions on heap memory? and the answer (by Peter Cordes) was, "You can use NT stores like movntps on normal WB memory (i.e. the heap)." This question is about non-temporal loads (not stores) with PREFETCHNTA.
From what I have read, it looks like PREFETCHNTA works with ordinary heap memory, but I wonder why it's always qualified by "must be writeback memory type."