2

I'm wondering if it is possible to do the following calculation with four values parallel within a MMX-Register:

(a*b)/256

where a is a signed word and b is an unsigned value (blend factor) in the range of 0-256

I think my problem is that I'm not sure about how (or if) pmullw and pmulhw will help me with this task.

jsi1
  • 23
  • 2

1 Answers1

2

If you know that a*b won't overflow a signed 16-bit field, then you can use pmullw (intrinsic _mm_mullo_pi16, or SSE intrinsic _mm_mullo_epi16) and then shift right by 8 to do the division by 256.

Where

MMX:

__m64 a, b;
...
a = _mm_mullo_pi16 (a, b);
a = _mm_srli_pi16 (a, 8);

SSE2:

__m128i a, b;
...
a = _mm_mullo_epi16 (a, b);
a = _mm_srli_epi16 (a, 8);
mattst88
  • 1,462
  • 13
  • 21
  • The problem is, that a*b **will** overflow a 16-bit field. Is there a way to get that managed anyway? I'll have a closer look on SSE2. Thanks mattst88! – jsi1 Jun 22 '12 at 20:48
  • If a*b overflows 16-bit, then you can shift either a or b to the left by 8 bit, and then do pmulhw. – Eugene Smith Jun 23 '12 at 00:04
  • The result will be correct because those bits of the result only depends on the lower bits of a and b – phuclv Mar 08 '15 at 03:38