I was referring to Intel's manual on the Xeon Phi instruction set and wasn't able to understand how the scatter/gather instructions work.
Suppose if I have the following vector of doubles:
A-> |b4|a4|b3|a3|b2|a2|b1|a1|
Is it possible to create 4 vectors as follows:
V1->|b1|a1|b1|a1|b1|a1|b1|a1|
V2->|b2|a2|b2|a2|b2|a2|b2|a2|
V3->|b3|a3|b3|a3|b3|a3|b3|a3|
V4->|b4|a4|b4|a4|b4|a4|b4|a4|
using these instructions? Is there any other way to achieve this?