I'm playing by writing with some AVX-optimised functions. I want to use a standard unsigned integer argument type like uint64_t
instead of the 256-bit unsigned integer the AVX instructions require (__m256i_u
). Is it possible to do the cast?

- 76
- 4
-
Is either one of these types a part of the C language standard? – Nov 27 '22 at 09:40
-
What is the actual type of `__m256i_u`? From the `_u` suffix it kind of looks like it's a *union*, so you have better look what's inside that union. And what does the documentation say? What does your text-books say? Your tutorials? – Some programmer dude Nov 27 '22 at 09:42
-
at least in my c headers it is defined as `typedef long long __m256i_u __attribute__((__vector_size__(32), __aligned__(1)));` – Moldytzu Nov 27 '22 at 09:45
-
2@Someprogrammerdude the `_u` stands for "unaligned" in this case. gcc/clang use a pointer to this type as argument for unaligned loads/stores. – chtz Nov 27 '22 at 10:01
-
1@Moldytzu You need to give more details, on what you trying to achieve. Do you want to put the 64bit integer in the lowest part of your `__m256i`, or do you want to quadruple (broadcast) it? Please provide a [mre]. – chtz Nov 27 '22 at 10:05
-
I'd like to fill the variable like this `((__m256i_u)value << 192) | ((__m256i_u)value << 128) | ((__m256i_u)value << 64) | (__m256i_u)value `, so it quadruples the value. – Moldytzu Nov 27 '22 at 10:09
-
2_mm256_set1_epi64x ? (the instructions are happy to take a __m256i, you shouldn't look at the implementation detail with _u. – Marc Glisse Nov 27 '22 at 10:45
-
1Or if you are programming specifically for gcc and using a basic operation, `vec+42` automatically does the broadcast for you. – Marc Glisse Nov 27 '22 at 10:47
1 Answers
No, Intel's intrinsics API doesn't allow actual C casts between integer and vector types, I think not even between uint64_t
and __m64
(a 64-bit MMX vector).
Use _mm256_set...
and _mm_cvtsi128_si64
. (And _mm256_castsi256_si128
when necessary) to get value(s) into or the low value out of a vector, with a broadcast or a list of operands. See Intel's intrinsics guide for cvt
and _mm256_set
intrinsics; Google the intrinsic name for examples of using it, especially with site:stackoverflow.com
. You might want to limit your intrinsics guide searches to SSE4, not AVX2, to limit the number of intrinsics to wade through. And so the parameter list is shorter; it's more immediately visible that _mm_set_epi32()
takes 4 int
args, for a total of 128 bits.
See also What are the names and meanings of the intrinsic vector element types, like epi64x or pi32? re: the existence of epi64x
vs. epi64
(MMX to XMM vs. 64-bit integer)
Also, use __m256i
, not GCC's internal __m256i_u
unaligned type. Use __m256i v = __mm256_loadu_si256((const __m256i*) ptr);
to do an unaligned load.

- 328,167
- 45
- 605
- 847