I'm trying my first steps with SIMD and I was wondering what the right approach is to the following problem. Consider two vectors:
+---+---+---+---+ +---+---+---+---+
| 0 | 1 | 2 | 3 | | 4 | 5 | 6 | 7 |
+---+---+---+---+ +---+---+---+---+
How to "interleave" the elements of those vectors so that they become:
+---+---+---+---+ +---+---+---+---+
| 0 | 4 | 1 | 5 | | 2 | 6 | 3 | 7 |
+---+---+---+---+ +---+---+---+---+
I was surprised I could not find an instruction for doing it, given the great many kinds of shuffles, broadcasts, permutes, ... Probably it could be done with some unpacklo
and unpackhi
and what not, but I was wondering if there is a canonical way of doing it as it seems to be quite common problem (SoA vs. AoS). For simplicity let's assume AVX(2) and vectors of four floats.
Edit:
Floats vs. doubles
The comment below (correctly) suggest I should use unpcklps
and unpckhps
for floats. Which instruction should I use to unpack vector of four doubles? I'm asking because _mm256_unpacklo_pd
/_mm256_unpackhi_pd
:
Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b, and store the results in dst.
DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) {
dst[63:0] := src1[127:64]
dst[127:64] := src2[127:64]
RETURN dst[127:0]
}
dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128])
dst[MAX:256] := 0
So what it apparently does is:
+---+---+---+---+ +---+---+---+---+
| 0 | 4 | 2 | 6 | | 1 | 5 | 3 | 7 |
+---+---+---+---+ +---+---+---+---+