1

I'm playing by writing with some AVX-optimised functions. I want to use a standard unsigned integer argument type like uint64_t instead of the 256-bit unsigned integer the AVX instructions require (__m256i_u). Is it possible to do the cast?

Moldytzu
  • 76
  • 4
  • Is either one of these types a part of the C language standard? –  Nov 27 '22 at 09:40
  • What is the actual type of `__m256i_u`? From the `_u` suffix it kind of looks like it's a *union*, so you have better look what's inside that union. And what does the documentation say? What does your text-books say? Your tutorials? – Some programmer dude Nov 27 '22 at 09:42
  • at least in my c headers it is defined as `typedef long long __m256i_u __attribute__((__vector_size__(32), __aligned__(1)));` – Moldytzu Nov 27 '22 at 09:45
  • 2
    @Someprogrammerdude the `_u` stands for "unaligned" in this case. gcc/clang use a pointer to this type as argument for unaligned loads/stores. – chtz Nov 27 '22 at 10:01
  • 1
    @Moldytzu You need to give more details, on what you trying to achieve. Do you want to put the 64bit integer in the lowest part of your `__m256i`, or do you want to quadruple (broadcast) it? Please provide a [mre]. – chtz Nov 27 '22 at 10:05
  • I'd like to fill the variable like this `((__m256i_u)value << 192) | ((__m256i_u)value << 128) | ((__m256i_u)value << 64) | (__m256i_u)value `, so it quadruples the value. – Moldytzu Nov 27 '22 at 10:09
  • 2
    _mm256_set1_epi64x ? (the instructions are happy to take a __m256i, you shouldn't look at the implementation detail with _u. – Marc Glisse Nov 27 '22 at 10:45
  • 1
    Or if you are programming specifically for gcc and using a basic operation, `vec+42` automatically does the broadcast for you. – Marc Glisse Nov 27 '22 at 10:47

1 Answers1

2

No, Intel's intrinsics API doesn't allow actual C casts between integer and vector types, I think not even between uint64_t and __m64 (a 64-bit MMX vector).

Use _mm256_set... and _mm_cvtsi128_si64. (And _mm256_castsi256_si128 when necessary) to get value(s) into or the low value out of a vector, with a broadcast or a list of operands. See Intel's intrinsics guide for cvt and _mm256_set intrinsics; Google the intrinsic name for examples of using it, especially with site:stackoverflow.com. You might want to limit your intrinsics guide searches to SSE4, not AVX2, to limit the number of intrinsics to wade through. And so the parameter list is shorter; it's more immediately visible that _mm_set_epi32() takes 4 int args, for a total of 128 bits.

See also What are the names and meanings of the intrinsic vector element types, like epi64x or pi32? re: the existence of epi64x vs. epi64 (MMX to XMM vs. 64-bit integer)

Also, use __m256i, not GCC's internal __m256i_u unaligned type. Use __m256i v = __mm256_loadu_si256((const __m256i*) ptr); to do an unaligned load.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847