1
__m128i a = {1,2,3,4,5,6,7,8}; //8x16bit

I want to split this register into 2 vectors each contains 4x32bit :

__m128i x = {1,3,5,7}
__m128i y = {2,4,6,8} 

Is it possible with intrinsic code ?

In RAM, I have raw data of 16bits words. e.g: 1,2,3,4,5,6,7,8 The goal is to split this stream into real part (1,3,5,7) and imaginary part (2,4,6,8)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Zvi Vered
  • 459
  • 2
  • 8
  • 16
  • What element size do you want `x` and `y` to have? `__128` isn't a type name. If you meant `__m128`, that's a floating-point type so you have to convert after widening to 32-bit elements. Those initializer lists are ambiguous; maybe `_mm_setr_epi32(1,3,5,7)` would describe what you want. – Peter Cordes Apr 10 '23 at 21:28
  • 1
    For the actual unpacking, `_mm_unpacklo_epi16` and `_mm_unpackhi_epi16` (with a zeroed vector) would give you `1,2,3,4` / `5,6,7,8`, assuming you want zero-extension. The low half could use SSE4.1 `pmovzxwd` (or `pmovsxwd` sign extension). But to get interleaving, just mask and shift, like `low_halves = _mm_and_si128` and `high_halves = _mm_srli_epi32(v, 16)`. – Peter Cordes Apr 10 '23 at 21:31
  • An alternative for masking out the low halves with `_mm_and_si128` would be `_mm_blend_epi16(v, _mm_setzero_si128(), 0xAA)` (if you really had to save registers, you could even blend with the empty halves of `high_halves` obtained by shifting right). – chtz Apr 11 '23 at 00:18
  • Hi Peter, chtz. Sorry for the partial info. I fixed the original question. Now it describes what I'm looking for. Thank you very much. – Zvi Vered Apr 11 '23 at 03:50
  • Ok, so 32-bit elements, presumably integer in `__m128i`. Also, normally you'd number elements from 0, like `0,1,...,6,7`, with the same element indices you'd use for shuffles. Not a big deal here, and variable-control shuffles aren't needed for this. – Peter Cordes Apr 11 '23 at 03:54
  • Hi Peter. Sorry. Did not understand your solution. Should I use _mm_shuffle_epi32 ? How should I set imm8 ? Thank you very much. – Zvi Vered Apr 11 '23 at 04:15
  • No, `_mm_shuffle_epi32` wouldn't make any sense. All your data stays within the 32-bit element it started in, you just need to zero-extend or sign-extend a 16-bit half into the 32-bit element, like with `_mm_srli_epi32` like I said. – Peter Cordes Apr 11 '23 at 04:54

1 Answers1

2

Assuming you have everything loaded into an __m128i and you're dealing with signed integers, I think the easiest way would be:

__m128i x = _mm_srai_epi32(_mm_slli_epi32(a, 16), 16);
__m128i y = _mm_srai_epi32(a, 16);

For unsigned integers, as Peter mentioned in the comments:

__m127i x = _mm_and_si128(v, _mm_set1_epi32(0x0000FFFF));
__m128i y = _mm_srli_epi32(a, 16);
nemequ
  • 16,623
  • 1
  • 43
  • 62
  • 2
    That's assuming you want sign-extension. If you just want zero-extension, it's only 2 instructions, `_mm_srli_epi32(v, 16`` and `_mm_and_si128(v, _mm_set1_epi32(0x0000FFFF))`. – Peter Cordes Apr 11 '23 at 04:53
  • Hi Peter. Highly appreciate your help. Best regards. – Zvi Vered Apr 11 '23 at 04:56
  • 1
    For sign-extending the lower half, instead of left- and right-shifting you could also do `_mm_madd_epi16(a, _mm_set1_epi32(1))`. – chtz May 18 '23 at 20:03