In Intel Intrinsics, you may find such instructions:
In the description: "Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in dst.", it says the instruction will produce the full 128-bit integer as an intermediate result, but it will only store the low 64 bit.
How do I get the high 64 bit?
A similar instruction is mulx
, it multiplies two 64-bit integers and store all 128 bits into two 64 bit register. In fact, I just want to find a SIMD version of mulx
.