2

To start the discussion, the basic differences between _mm_mul_epu32 and _mm_mul_epi32 are:

  • _mm_mul_epu32 is available in SSE2 and takes and produces unsigned integers (32 bit -> 64 bit)

  • _mm_mul_epi32 is available in SSE4.1 and takes and produces signed integers (32 bit -> 64 bit)

What I don't understand is under what circumstances should one use _mm_mul_epu32? There doesn't seem to be a set instruction like _mm_set[1]_epi32. Like in this example: SSE multiplication of 4 32-bit integers, the best answer writes:

static inline __m128i muly(const __m128i &a, const __m128i &b)
{
    __m128i tmp1 = _mm_mul_epu32(a,b); /* mul 2,0*/
    __m128i tmp2 = _mm_mul_epu32( _mm_srli_si128(a,4), _mm_srli_si128(b,4)); /* mul 3,1 */
    return _mm_unpacklo_epi32(_mm_shuffle_epi32(tmp1, _MM_SHUFFLE (0,0,2,0)), _mm_shuffle_epi32(tmp2, _MM_SHUFFLE (0,0,2,0))); /* shuffle results to [63..0] and pack */
}

_mm_mul_epu32 is used with _epi32 instructions. Isn't this risky to ignore the difference between signed and unsigned integers?

Can someone please provide an example of where _mm_mul_epu32 can be safely used? Thanks!

Community
  • 1
  • 1
  • 1
    Two's complement encoding makes the difference between a signed and an unsigned number disappear for loads, stores, adds and subs. Which is why there's only _mm_set_epi32 and no _mm_set_epu32. But it does matter for muls and divs. – Hans Passant Oct 24 '13 at 21:14
  • Thank you! I think I'll need casting between signed int and unsigned int, since I happen to be wanting to use 4-way SSE to simulate 64-bit multiplication in vector registers. @chys Thank you! Unfortunately, I don't have enough reputation to vote up your answer yet. –  Nov 07 '13 at 23:18

1 Answers1

2

Use _mm_mul_epu32 when the operands should be considered unsigned integers, and _mm_mul_epi32 otherwise.

In 32-bit -> 64-bit multiplication, treating operands as unsigned or signed yield different results, so there are separate instructions. Add, sub and mov don't need separate instructions. There is no separate __m128u type. Just use __m128i and remember it contains unsigned numbers.

chys
  • 1,546
  • 13
  • 17