x86_64 has an instruction movdir64b
, which to my understanding is a non-temporal copy (well, at least the store is) of 64 bytes (a cache line). AArch64 seems to have a similar instruction st64b
, which does an atomic store of the same size. However, the official ARMv9 documentation is not clear about whether st64b
, too, is a non-temporal store.
Intel's instruction-set reference documentation for movdir64b
is much more detailed, but I'm not far along enough in my studies to fully understand what each memory type protocol represents.
From what I could deduce so far, the x86_64 instruction movntdq
is roughly equivalent to stnp
, and is write-combining. From that, it seems as if movdir64b
is like four of those in one atomic store, hence my guess about st64b
.
This is almost certainly an oversimplification of what's really going on (and could be wrong/inaccurate, of course), but it's what could deduce so far.
Could st64b
be used as if it were an atomic sequence of four stnp
instructions as a non-temporal write of a cache line in this way?