4

The intel intrinsic functions have the subtype of the vector built into their names. For example, _mm_set1_ps is a ps, which is a packed single-precision aka. a float. Although the meaning of most of them is clear, their "full name" like packed single-precision isn't always clear from the function descriptions. I have created the following table. Unfortunately some entries are missing. What are the value of them? Additional questions below the table.

abbreviation full name C/++ equivalent
ps packed single-precision float
ph packed half-precision None**
pd packed double-precision double
pch packed half-precision complex None**
pi8 ??? int8_t
pi16 ??? int16_t
pi32 ??? int32_t
epi8 ??? int8_t
epi16 ??? int16_t
epi32 ??? int32_t
epi64 ??? int64_t
epi64x ??? int64_t

Additional questions:

  1. Have I missed any?
  2. What is the difference between epiX and piX?
  3. Why does no pi64 exist?
  4. What is the difference between epi64 and epi64x?

** I have found this, but there seems to be no standard way to represent a half precision (complex) value in C/++. Please correct me if this has changed in any way.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Brotcrunsher
  • 1,964
  • 10
  • 32
  • This is off-topic as either C or C++ - any answers related to those will be very specific to particular compilers (e.g. intel compilers). To increase chances of getting a useful reply, I suggest removing those tags and finding tag(s) related specifically to intel. – Peter Jan 30 '22 at 04:41
  • I have added the C/++ tags because of the question within the Footnote. Is this still regarded as off-topic? – Brotcrunsher Jan 30 '22 at 04:43
  • 1
    I'd argue it is. The C and C++ tags are related to standard C or standard C++ respectively, and your question is not relevant to that. Your question will be specific to particular compilers (intel compilers?) so better to tag accordingly – Peter Jan 30 '22 at 04:44
  • @Peter Actually intel intrinsics are supported by the "3 big ones": Clang, GCC, MSVC. However, as they are not standard I see your point. I have removed the tags. – Brotcrunsher Jan 30 '22 at 04:47
  • @Brotcrunsher What about the fourth? ICC, the intel compiler. Does the Intel compiler not support intel intrinsics? Feels weird. – Shambhav Jan 30 '22 at 04:49
  • @ShambhavGautam I did not say that. I just have no clue about ICC. – Brotcrunsher Jan 30 '22 at 04:52
  • @Peter: Intel defined these C / C++ extensions, but all the mainstream x86 compilers (GCC/clang/ICC and MSVC) support them with the same names (but different implementation details). I agree in this case it's not really a C question, since it's not about writing a C function using them (where interaction with other things, like aligned allocators and how to index arrays properly, are relevant), as opposed to Rust or C# using Intel intrinsics. But plenty of SSE intrinsics questions *are* valid C or C++ questions. [Some](https://stackoverflow.com/q/52112605) are even language-lawyery. – Peter Cordes Jan 30 '22 at 05:06
  • @PeterCordes C/C++ tags are mainly related to standard C/C++, not to machine instructions/support by C/C++ compilers, so the vast majority of people who read C or C++ tagged material won't have knowledge/interest in intel intrinsics. There are tags related to particular compilers (not included on the question) and people who follow those tags are more likely able to offer help. Anyway, I'll leave it there - my comment was about tagging to maximise chances of getting a useful response but I realise SO members are inconsistent/argumentative about what tagging is/isn't acceptable. – Peter Jan 30 '22 at 05:19
  • @Peter: Would you argue that questions about GCC inline asm should *only* be tagged [inline-assembly][gcc] (or [clang] or [icc]) and not [c]? Pretty sure there's so many questions in the [c] and [c++] tags that people following them should expect not to be interested in many of them, or even be able to answer them. e.g. about a specific C++ library. Intel's intrinsics are indistinguishable from a pure library with overloaded classes except for performance, or if you dereference a `__m128i*` instead of using a load or store intrinsic. (As an extension, it can alias anything like `char*`) – Peter Cordes Jan 30 '22 at 05:25
  • @Peter: Obviously questions about intrinsics *also* need to be tagged with the appropriate tags, like in this case SSE and MMX, since these are intrinsics for those CPU extensions. – Peter Cordes Jan 30 '22 at 05:26
  • @PeterCordes Yes, I would argue that a question about GCC inline asm that is tagged [inline-assembly][gcc] (or the equivalents for other compilers) need not have a C tag. The fact there are a lot of such questions that *do* have a C tag doesn't mean it added much of use or will for people tagging questions similarly in future - for either the person asking the question or the (presumably significant) number of people who follow the C tag but have no knowledge/interest of inline-assembly or deep details of gcc. – Peter Jan 30 '22 at 05:36
  • Related: [Meaning of suffix "x" in intrinsics like "\_mm256\_set1\_epi64x"](https://stackoverflow.com/q/44989391) re: MMX having already taken the `_mm_set[1]_epi64` name. – Peter Cordes Aug 19 '23 at 20:01

1 Answers1

7
  1. The missing versions are at least si128 and si64, used in bitwise operations and [e]pu{8,16,32,64} for unsigned operations.

  2. epi and pi differ in e probably meaning extended; epi register target is an 128 bit xmm register, while pi targets 64-bit mmx registers.

  3. pi64 does not exists, because the original mmx instruction set was limited to 32-bit elements; si64 is still available.

  4. The main argument for using epi64x instead of epi64 needs to do with lack of function overloading in C. There was need to provide set/conversion methods both for __m128i _mm_set1_epi64(__m64) which moves from MMX to XMM and for __m128i _mm_set1_epi64x(int64_t) working with integers. Additionally it seems that in the rest of the cases the 64x suffix is reserved for modes requiring 64-bit architecture, as in movq between a register and low half of __m128i, which could be emulated by multiple instruction, and for something like __int64 _mm_cvtsd_si64x (__m128d a), which converts a double to 64-bit register target (not to memory directly).

What I would speculate, is that 'si64' and 'si128' mean scalar integer of width 64/128_, notice that there exists _mm_add_si64 (that is not original SSE intrinsic, that is SSE2 intrinsic extending the original MMX instruction set and using MMX registers). It's si64, not pi64, because only one element of the same size as the whole register is involved.

Lastly piN means packed integer of element size N targeting MMX (__m64) and epiN means packed integer of elements size N targeting XMM (__m128i).

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
  • Pretty close to accepting this as the answer. I am only missing what the `s` in `si` stands for, same for the `p` in `pi`. Also, what is the difference between `piX` and `siX`? – Brotcrunsher Jan 30 '22 at 05:04
  • 1
    @Brotcrunsher: `si` is I think scalar integer, just like `ss` is scalar single-precision vs. `ps` packed single. e.g. `_mm_loadu_si32(void*)` and `_mm_cvtsi32_si128(int)` are intrinsics for `movd`, and `_mm_cvtsi32_sd` is `cvtsi2sd` (int32 -> FP conversion). `si128` like bitwise booleans and integer loads are the whole vector as a notional scalar integer that's really wide, because there aren't any meaningful element boundaries. Also with byte-shift shuffles like `pslldq` = `_mm_bslli_si128` – Peter Cordes Jan 30 '22 at 05:11
  • I would guess the `epi`, `pi` distinction follows the earlier naming convention, where `ax,bx, ...` were extended to `eax, ebx, ...`. – Aki Suihkonen Jan 30 '22 at 05:17
  • @Aki: 4: `epi64x` exists because they already used up the sensible names for MMX -> XMM stuff, like SSE2 `__m128i _mm_set1_epi64 (__m64 a)`. I have no clue why they used the `x` name for the plain `int64_t` version (or `__int64` as Intel would have it); seems very shortsighted from our perspective with MMX being long obsolete and `int64_t` being highly relevant especially with SSE4 providing even more stuff you can do with them, and wider vectors to make it worthwhile. The two intrinsics involving epi64x at all (`set` and `set1`) were new in SSE2, along with the same-named epi64 versions. – Peter Cordes Jan 30 '22 at 05:19
  • You might be partially right: I think some (versions of) compilers chose not to provide `_mm_set_epi64x` for 32-bit builds for some reason, even though . But the same-named intrinsic always did the same thing if it existed at all. – Peter Cordes Jan 30 '22 at 05:20
  • `epi64x` *was* specific to 64-bit architectures (but [neither of the two intrinsics using it](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=MMX,SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2&text=epi64x), just `_mm_set*`, were a single instruction anyway so there was no point not providing it). – Peter Cordes Jan 30 '22 at 05:33
  • **I still strongly disagree with saying `epi64` is used in 32-bit architectures.** It's used with `__m64` => `__m128i` forms of a few things, and inconsistently with that it's used in SSE4.1 `__int64 _mm_extract_epi64`. It's also used to describe elements for `pmovzxbq` etc. which have no interaction with scalar `int` or `__int64`, or the target bitness, or MMX. (IIRC, MSVC disables MMX support when targeting 64-bit code for some reason. But that's just MSVC; all other compilers provide `_mm_set_epi64` in both modes.) – Peter Cordes Jan 30 '22 at 05:35
  • I will try to reformulate – Aki Suihkonen Jan 30 '22 at 05:36
  • Ah, yes, then `si64` is MMX (not 32 or 64-bit mode), and all uses of `si64x` do require 64-bit mode (at least for the single instruction Intel documents; compilers may allow `_mm_cvtsi64x_si128(__int64)` in 32-bit mode via multiple instructions from scalar regs, or a load from memory if the int64 happens to be there) – Peter Cordes Jan 30 '22 at 05:39
  • Good update. Note that unlike pure data movement, `_mm_cvtsd_si64x` *can't* be emulated efficiently on a 32-bit machine (without AVX-512 for packed conversion from double to int64_t). In compat/legacy mode without AVX512, the only other double -> int64 option is loading it into the x87 FPU for `fistp m64` (which is supported all the way back to 8087), or bit-hack tricks that use a few instructions on the FP bit pattern. None of which would be appropriate for that intrinsic. – Peter Cordes Jan 30 '22 at 06:28
  • It seems GCC/clang/MSVC all provide `_mm_set[1]_epi64x` in 32-bit mode, but not `_mm_cvtsi64x_si128` or `_mm_cvtsi128_si64x`: https://godbolt.org/z/TeW5xscxv The `cvt` intrinsics are nominally a single instruction, unlike `_mm_set`, so that makes sense. Also, GCC and clang disagree about the calling convention (mm0 vs. stack) for passing an `__m64` are in 32-bit mode! – Peter Cordes Jan 30 '22 at 06:35
  • Related: [Meaning of suffix "x" in intrinsics like "\_mm256\_set1\_epi64x"](https://stackoverflow.com/q/44989391) re: MMX having already taken the `_mm_set[1]_epi64` name with `__m64` args, hence the `epi64x` to indicate `__int64`. This changed for AVX-512, which uses `_mm512_set[1]_epi64` non-x. – Peter Cordes Aug 19 '23 at 20:01