Scatter intrinsics in AVX

Question

I can't find them in the Intel Intrinsic Guide v2.7. Do you know if AVX or AVX2 instruction sets support them?

Gathered loads: http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_bk_avx2_masked_gather.htm - I don't see the scattered store instrinsics though — Paul R, Dec 24 '12 at 11:24
From RWT: _[AVX2 does not include scatter instructions (i.e., vector addressed stores), because of complications with the x86 memory ordering model and the load/store buffers.](http://www.realworldtech.com/haswell-cpu/2/)_ — elmattic, Dec 27 '12 at 14:32

Marat Dukhan · Accepted Answer · 2015-12-01T17:42:18.933

23

There are no scatter or gather instructions in the original AVX instruction set.
AVX2 adds gather, but not scatter instructions.
AVX512F includes both scatter and gather instructions.
AVX512PF additionally provides prefetch variants of gather and scatter instructions.
AVX512CD provides instructions to detect conflicts in scatter addresses.
Intel MIC (aka Xeon Phi, Knights Corner) does include gather and scatter instructions, but it is a separate coprocessor, and it can not run normal x86-64 code.

edited Dec 01 '15 at 17:42

answered Dec 24 '12 at 11:33

Marat Dukhan

11,993
4
27
41

1

@Jeff No it doesn't! KNC even has a separate ELF machine type – Marat Dukhan Nov 30 '15 at 06:31
2

@Jeff: KNL (Knight's Landing) should run x86_64 machine code, though, right? It's even going to be available as a host CPU, rather than just coprocessor. – Peter Cordes Nov 30 '15 at 07:22
1

@PeterCordes Yes. I have binaries that run on both Haswell Xeon E3 with AVX2 and Knights Landing with AVX-512. – Jeff Hammond Nov 30 '15 at 13:02
@MaratDukhan That's mixing two issues. Mac and Linux ELF binaries aren't compatible yet they may both be for x86_64. Let's not mix up HW and OS. – Jeff Hammond Nov 30 '15 at 13:04

score 12 · Answer 2 · answered Jul 10 '13 at 19:10

As the other answer indicated, it is not possible to implement scatter for now, even on AVX2. However intel Optimization manual does provide us with a hand written version of scatter operation. It is on page 11-17 of Intel optimization manual 2013 version. Basically what do they do is they read the index everytime and store it into a general-purpose register, say, rax and then shift the correct number you want to a xmm register using things like vpalignr. Then we store the result to memory location with vmovss---move scalar single to memory. I guess this will be of low efficiency but I guess this is the only way to realize data scatter on X86 CPU architecture for now. On Xeon Phi things are beautiful, they provide native support for scatter operations and the first op, of course, is a memory location. So I believe if your code involves a lot of gather and scatter, switching to Xeon Phi will be a good choice. Please do reply to tell me if there is anything wrong in my reply.

Good Luck!

xiangpisaiMM

Thanks for your insight, my hope is more into AVX3 (because it will probably bring native scatter with the unification of Core and MIC simd instructions). — elmattic, Jul 15 '13 at 08:06
shift and then store sounds slower than using `extractps`, since the element to extract is a compile-time constant. Or maybe the same speed, but smaller code-size, since it still has to use the shuffle port. — Peter Cordes, Nov 30 '15 at 07:24

Scatter intrinsics in AVX

2 Answers2

Linked