To start the discussion, the basic differences between _mm_mul_epu32 and _mm_mul_epi32 are:
_mm_mul_epu32 is available in SSE2 and takes and produces unsigned integers (32 bit -> 64 bit)
_mm_mul_epi32 is available in SSE4.1 and takes and produces signed integers (32 bit -> 64 bit)
What I don't understand is under what circumstances should one use _mm_mul_epu32? There doesn't seem to be a set instruction like _mm_set[1]_epi32. Like in this example: SSE multiplication of 4 32-bit integers, the best answer writes:
static inline __m128i muly(const __m128i &a, const __m128i &b)
{
__m128i tmp1 = _mm_mul_epu32(a,b); /* mul 2,0*/
__m128i tmp2 = _mm_mul_epu32( _mm_srli_si128(a,4), _mm_srli_si128(b,4)); /* mul 3,1 */
return _mm_unpacklo_epi32(_mm_shuffle_epi32(tmp1, _MM_SHUFFLE (0,0,2,0)), _mm_shuffle_epi32(tmp2, _MM_SHUFFLE (0,0,2,0))); /* shuffle results to [63..0] and pack */
}
_mm_mul_epu32 is used with _epi32 instructions. Isn't this risky to ignore the difference between signed and unsigned integers?
Can someone please provide an example of where _mm_mul_epu32 can be safely used? Thanks!