I was looking for an appropriate AVX2 multiplication instruction to use in my code, and came across the vpmulhrsw (_mm256_mulhrs_epi16(__m256i a, __m256i b)
) instruction.
The description on the Intel Intrinsics Guide says:
Multiply packed signed 16-bit integers in
a
andb
, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] todst
.
I understand what the instruction does, but the instruction sounds like it is tailored for some very specific use case. What is this use case?