C: is it possible to cast a uint64_t to const __m256i_u?

Question

I'm playing by writing with some AVX-optimised functions. I want to use a standard unsigned integer argument type like uint64_t instead of the 256-bit unsigned integer the AVX instructions require (__m256i_u). Is it possible to do the cast?

Is either one of these types a part of the C language standard? — , Nov 27 '22 at 09:40
What is the actual type of `__m256i_u`? From the `_u` suffix it kind of looks like it's a *union*, so you have better look what's inside that union. And what does the documentation say? What does your text-books say? Your tutorials? — Some programmer dude, Nov 27 '22 at 09:42
at least in my c headers it is defined as `typedef long long __m256i_u __attribute__((__vector_size__(32), __aligned__(1)));` — Moldytzu, Nov 27 '22 at 09:45
@Someprogrammerdude the `_u` stands for "unaligned" in this case. gcc/clang use a pointer to this type as argument for unaligned loads/stores. — chtz, Nov 27 '22 at 10:01
@Moldytzu You need to give more details, on what you trying to achieve. Do you want to put the 64bit integer in the lowest part of your `__m256i`, or do you want to quadruple (broadcast) it? Please provide a [mre]. — chtz, Nov 27 '22 at 10:05
I'd like to fill the variable like this `((__m256i_u)value << 192) | ((__m256i_u)value << 128) | ((__m256i_u)value << 64) | (__m256i_u)value `, so it quadruples the value. — Moldytzu, Nov 27 '22 at 10:09
_mm256_set1_epi64x ? (the instructions are happy to take a __m256i, you shouldn't look at the implementation detail with _u. — Marc Glisse, Nov 27 '22 at 10:45
Or if you are programming specifically for gcc and using a basic operation, `vec+42` automatically does the broadcast for you. — Marc Glisse, Nov 27 '22 at 10:47

Peter Cordes · Accepted Answer · 2022-11-27T13:50:46.950

No, Intel's intrinsics API doesn't allow actual C casts between integer and vector types, I think not even between uint64_t and __m64 (a 64-bit MMX vector).

Use _mm256_set... and _mm_cvtsi128_si64. (And _mm256_castsi256_si128 when necessary) to get value(s) into or the low value out of a vector, with a broadcast or a list of operands. See Intel's intrinsics guide for cvt and _mm256_set intrinsics; Google the intrinsic name for examples of using it, especially with site:stackoverflow.com. You might want to limit your intrinsics guide searches to SSE4, not AVX2, to limit the number of intrinsics to wade through. And so the parameter list is shorter; it's more immediately visible that _mm_set_epi32() takes 4 int args, for a total of 128 bits.

See also What are the names and meanings of the intrinsic vector element types, like epi64x or pi32? re: the existence of epi64x vs. epi64 (MMX to XMM vs. 64-bit integer)

Also, use __m256i, not GCC's internal __m256i_u unaligned type. Use __m256i v = __mm256_loadu_si256((const __m256i*) ptr); to do an unaligned load.

C: is it possible to cast a uint64_t to const __m256i_u?

1 Answers1