1

I was referring to Intel's manual on the Xeon Phi instruction set and wasn't able to understand how the scatter/gather instructions work.

Suppose if I have the following vector of doubles:

A-> |b4|a4|b3|a3|b2|a2|b1|a1|

Is it possible to create 4 vectors as follows:

V1->|b1|a1|b1|a1|b1|a1|b1|a1|
V2->|b2|a2|b2|a2|b2|a2|b2|a2|
V3->|b3|a3|b3|a3|b3|a3|b3|a3|
V4->|b4|a4|b4|a4|b4|a4|b4|a4|

using these instructions? Is there any other way to achieve this?

Boppity Bop
  • 9,613
  • 13
  • 72
  • 151
user1715122
  • 947
  • 1
  • 11
  • 26
  • Yeah, I don't think Larrabee is the appropriate word for it. I think the OP is referring to the Xeon Phi architecture, which was just released recently. I haven't seen much information out there yet. [Here is a link to its instruction set reference](http://software.intel.com/sites/default/files/forum/278102/327364001en.pdf). It looks like it has some pretty powerful capabilities. – Jason R Mar 12 '13 at 12:21
  • 1
    They are pretty much the same thing. – Jasper Bekkers Mar 12 '13 at 13:08

1 Answers1

1

Got this from the Intel Forums (answered by Evgueni Petrov):

__m512d V1 = (__m512d)_mm512_extload_epi32(&Addr, _MM_UPCONV_EPI32_NONE, _MM_BROADCAST_4X16, _MM_HINT_NONE);

where 'Addr' is the address of the location in memory, from which we loaded the doubles into vector 'A'.

We can do a similar operation for V2,V3,V4, by using &(Addr+2), &(Addr+4) and &(Addr+6) respectively.

user1715122
  • 947
  • 1
  • 11
  • 26