Where does the SIMD instruction intermediate result store?

Question

In Intel Intrinsics, you may find such instructions:

In the description: "Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in dst.", it says the instruction will produce the full 128-bit integer as an intermediate result, but it will only store the low 64 bit.

How do I get the high 64 bit?

A similar instruction is mulx, it multiplies two 64-bit integers and store all 128 bits into two 64 bit register. In fact, I just want to find a SIMD version of mulx.

The intermediate result is not stored anywhere. That's why it's called an intermediate result. (In reality, there is no intermediate result. The text describes how the function behaves, not how it's implemented in silicon.) — Raymond Chen, Apr 28 '20 at 13:08
Unfortunately, there is no corresponding `_mm256_mulhi_epi64`. I suppose there is no SIMD 64×64→128 multiplication available. Note also that you don't even need `mulx` for such a multiplication; plain old `mul` and `imul` both have extending variants. — fuz, Apr 28 '20 at 13:20
The closest you can get is AVX512IFMA [`VPMADD52HUQ` / `_mm512_madd52hi_epu64`](https://www.felixcloutier.com/x86/vpmadd52huq). Mysticial has written some about using FP FMA operations to do integer math without this ISA extension, though. Note that even `_mm256_mullo_epi64` requires AVX512DQ, not just AVX2. — Peter Cordes, Apr 28 '20 at 14:01

Where does the SIMD instruction intermediate result store?

0 Answers0