Split 16-bit vector (__m128i) into 2 vectors of odd and even positions with Intel intrinsics

Question

__m128i a = {1,2,3,4,5,6,7,8}; //8x16bit

I want to split this register into 2 vectors each contains 4x32bit :

__m128i x = {1,3,5,7}
__m128i y = {2,4,6,8}

Is it possible with intrinsic code ?

In RAM, I have raw data of 16bits words. e.g: 1,2,3,4,5,6,7,8 The goal is to split this stream into real part (1,3,5,7) and imaginary part (2,4,6,8)

What element size do you want `x` and `y` to have? `__128` isn't a type name. If you meant `__m128`, that's a floating-point type so you have to convert after widening to 32-bit elements. Those initializer lists are ambiguous; maybe `_mm_setr_epi32(1,3,5,7)` would describe what you want. — Peter Cordes, Apr 10 '23 at 21:28
For the actual unpacking, `_mm_unpacklo_epi16` and `_mm_unpackhi_epi16` (with a zeroed vector) would give you `1,2,3,4` / `5,6,7,8`, assuming you want zero-extension. The low half could use SSE4.1 `pmovzxwd` (or `pmovsxwd` sign extension). But to get interleaving, just mask and shift, like `low_halves = _mm_and_si128` and `high_halves = _mm_srli_epi32(v, 16)`. — Peter Cordes, Apr 10 '23 at 21:31
An alternative for masking out the low halves with `_mm_and_si128` would be `_mm_blend_epi16(v, _mm_setzero_si128(), 0xAA)` (if you really had to save registers, you could even blend with the empty halves of `high_halves` obtained by shifting right). — chtz, Apr 11 '23 at 00:18
Hi Peter, chtz. Sorry for the partial info. I fixed the original question. Now it describes what I'm looking for. Thank you very much. — Zvi Vered, Apr 11 '23 at 03:50
Ok, so 32-bit elements, presumably integer in `__m128i`. Also, normally you'd number elements from 0, like `0,1,...,6,7`, with the same element indices you'd use for shuffles. Not a big deal here, and variable-control shuffles aren't needed for this. — Peter Cordes, Apr 11 '23 at 03:54
Hi Peter. Sorry. Did not understand your solution. Should I use _mm_shuffle_epi32 ? How should I set imm8 ? Thank you very much. — Zvi Vered, Apr 11 '23 at 04:15
No, `_mm_shuffle_epi32` wouldn't make any sense. All your data stays within the 32-bit element it started in, you just need to zero-extend or sign-extend a 16-bit half into the 32-bit element, like with `_mm_srli_epi32` like I said. — Peter Cordes, Apr 11 '23 at 04:54

nemequ · Answer 1 · 2023-04-11T11:57:14.637

2

Assuming you have everything loaded into an __m128i and you're dealing with signed integers, I think the easiest way would be:

__m128i x = _mm_srai_epi32(_mm_slli_epi32(a, 16), 16);
__m128i y = _mm_srai_epi32(a, 16);

For unsigned integers, as Peter mentioned in the comments:

__m127i x = _mm_and_si128(v, _mm_set1_epi32(0x0000FFFF));
__m128i y = _mm_srli_epi32(a, 16);

edited Apr 11 '23 at 11:57

answered Apr 11 '23 at 04:20

nemequ

16,623
1
43
62

2

That's assuming you want sign-extension. If you just want zero-extension, it's only 2 instructions, `_mm_srli_epi32(v, 16`` and `_mm_and_si128(v, _mm_set1_epi32(0x0000FFFF))`. – Peter Cordes Apr 11 '23 at 04:53
Hi Peter. Highly appreciate your help. Best regards. – Zvi Vered Apr 11 '23 at 04:56
1

For sign-extending the lower half, instead of left- and right-shifting you could also do `_mm_madd_epi16(a, _mm_set1_epi32(1))`. – chtz May 18 '23 at 20:03

Split 16-bit vector (__m128i) into 2 vectors of odd and even positions with Intel intrinsics

1 Answers1