0

I found that the following code(C Files) can be compiled successfully in x86_64, gcc 10.1.0.

#include <immintrin.h>
#include <stdint.h>
#include <stdio.h>

typedef union{
    __m64 x;
#if defined(__arm__) || defined(__aarch64__)
    int32x2_t d[1];
#endif
    uint8_t i8u[8];
}u_m64;

int main()
{
    u_m64 a, b, c;
    c.x = a.x + b.x;
    
    return 0;
}

But there are lots of add function for __m64, like "_mm_add_pi16, _mm_hadd_pi16", "_mm_add_si64" and so on(The same applies to __mm128, __mm256...). So which one is called by the operate '+' ? And how can a 'Operator Overloading' be used in a C Files?

  • 1
    C doesn't have operator overloading the way C++ does, so if it's a numeric primitive (as opposed to something that can be logically added but is not directly a numeric primitive) I'm guessing it'll add like any other numeric primitive (ints, floats, etc.). – 404 Name Not Found Sep 26 '22 at 03:50
  • @404NameNotFound: GNU C native vectors *do* sort of overload the `+` operator, so you just need to know the underlying type, e.g. was it `typedef long long __m64 __attribute__((vector_size(8)))`, or `int` in which case it will be a SIMD operation on two packed `int32_t` (since this is x86 GCC). https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html . Of course, the Intel intrinsics API doesn't define anything about `+` working on `__m64` or `__m128i`, and it's not portable to all compilers (notably not MSVC). – Peter Cordes Sep 27 '22 at 06:50

1 Answers1

0

Yeah, gcc and clang provide basic operators for builtin SIMD types, which is frankly so beyond stupid that it's not even remotely funny :(

Anyhow, this mechanism isn't working in the same way as operator overloading in C++. What it's actually doing, is promoting __m64 to be a true intrinsic type (such as int/float), meaning the operators are at a language level, rather than overload level. (That's why it works in C).

In this case I would assume it is calling add (rather than horizontal add).

However, we now hit the biggest problem! - The contents of __m64 are NOT known at compile time!

Within any given __m64, we could be storing any permutation of:

  • 8 x int8
  • 4 x int16
  • 2 x int32
  • 8 x uint8
  • 4 x uint16
  • 2 x uint32

For addition (ignoring the saturated variants) that means the addition operator could be calling any one these perfectly valid choices:

  • _mm_add_pi8
  • _mm_add_pi16
  • _mm_add_pi32

I don't know which of those instructions gcc/clang ends up calling in this context, however I do know that it's always going to be the wrong instruction 66.66% of the time :(

robthebloke
  • 9,331
  • 9
  • 12
  • 1
    GCC uses 32bit addition on `__m64`, but 64bit addition on `__m128i`: https://godbolt.org/z/67nxrj9vW -- I agree that one should not rely on that behavior. – chtz Sep 26 '22 at 09:39
  • They define `__m64` and `__m128i` in terms of [GNU C native vector extensions](https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html). Those allow you to write *portable* SIMD code, for simple enough problems that you don't need ISA-specific instructions. Having `+` do anything for `__m64` is a consequence of an implementation choice (apparently `int __attribute__((vector_size(8)))`), not something you're really intended to use on `__m64`. Only on your own types with names that imply an element size. BTW, `__m64` can also store an `int64_t`, adding in one insn even in 32-bit mode. – Peter Cordes Sep 27 '22 at 06:54