3

If I have the following doubles in a 512-wide SIMD vector, as in a Xeon Phi register:

m0 = |b4|a4|b3|a3|b2|a2|b1|a1|

is it possible to make it into:

m0_d = |a4|a4|a3|a3|a2|a2|a1|a1| 

using a single instruction? Also since there are no bitwise intrinsics for doubles is this still a valid way to achieve the above?

m0_t = _mm512_swizzle_pd(m9,_MM_SWIZ_REG_CDAB);//m0_t->|a4|b4|a3|b3|a2|b2|a1|b1|
__m512d res = _mm512_mask_or_epi64(m0,k1,zero,m0_t);//k1 is 0xAA
Boppity Bop
  • 9,613
  • 13
  • 72
  • 151
user1715122
  • 947
  • 1
  • 11
  • 26
  • Actually this can be accomplished by: m0_d=_mm512_mask_swizzle_pd(m0,0xAA,m0,_MM_SWIZ_REG_CDAB); Overlooked the mask variant.. – user1715122 Mar 12 '13 at 05:47
  • If that solution works, you might consider adding it as an answer so it is easier to find. The Xeon Phi is still very new, so there isn't a lot of best-practices information out there yet. – Jason R Mar 12 '13 at 12:53

1 Answers1

3

Can be achieved as follows:

m0_d = _mm512_mask_swizzle_pd(m0,0xAA,m0,_MM_SWIZ_REG_CDAB);

It might seem that the swizzle operation is limited, but with the masked variant we can achieve other permutations too.

user1715122
  • 947
  • 1
  • 11
  • 26