7

I'm working with SSE intrinsic functions. I have an __m128i representing an array of 8 signed short (16 bit) values.

Is there a function to get the sign of each element?

EDIT1: something that can be used like this:

short tmpVec[8];
__m128i tmp, sgn;

for (i-0;i<8;i++)
    tmp.m128i_i16[i] = tmpVec[i]

sgn = _mm_sign_epi16(tmp);

of course "_mm_sign_epi16" doesn't exist, so that's what I'm looking for.

How slow it is to do it element by element?

EDIT2: desired behaviour: 1 for positive values, 0 for zero, and -1 for negative values.

thanks

Paul R
  • 208,748
  • 37
  • 389
  • 560
Michele
  • 366
  • 2
  • 13
  • 2
    In what format do you wish to receive the sign? Please post an example. – Sergey L. Apr 25 '14 at 12:53
  • [Intel Intrinsics Guide](https://software.intel.com/sites/landingpage/IntrinsicsGuide/) – Sean Bright Apr 25 '14 at 12:53
  • Depend on how you use the value, use a comparison with 0, a right shift by 15 or `and` each element with 0x8000 – phuclv Apr 25 '14 at 12:59
  • 1
    thanks! What do you mean by "how I use the value" ? – Michele Apr 25 '14 at 13:44
  • Do you want the sign result for each element to be +1/0/-1, or +1/-1, or 1/0, or what ? – Paul R Apr 25 '14 at 13:48
  • 1
    If you only want the sign bit, you can use `_mm_movemask_epi8(_mm_packs_epi16(tmp, _mm_setzero_si128()))`. In words: Pack into 8-bit values via signed saturation, setting the upper 64 bits to zero. This preserves sign. Then extract the 16 sign bits. Since the upper 64 bits are zero, the upper 8 sign bits will be zero. – Raymond Chen Apr 25 '14 at 14:01
  • I'll give it a try, but this won't give me 0 if the an element is zero, won't it? – Michele Apr 25 '14 at 14:23
  • The sign bit is 1 if the value is negative, and it is 0 if the value is zero or positive. I see that you clarified in your edit that you want -1/0/1, in which case `_mm_movemask_epi8` will not help you. – Raymond Chen Apr 25 '14 at 21:36
  • @PaulR, How would you do it for Floating Point numbers when limited to SSE3? Thank You. – Royi Dec 24 '16 at 12:29
  • @Royi: it should be pretty simple, but post a new question with an [sse] tag, and I'll do my best to come up wih a solution (as will others no doubt). Be sure to specify excpactly what outputs you require for different input cases. – Paul R Dec 24 '16 at 12:35

3 Answers3

14

You can use min/max operations to get the desired result, e.g.

inline __m128i _mm_sgn_epi16(__m128i v)
{
    v = _mm_min_epi16(v, _mm_set1_epi16(1));
    v = _mm_max_epi16(v, _mm_set1_epi16(-1));
    return v;
}

This is probably a little more efficient than explicitly comparing with zero + shifting + combining results.

Note that there is already an _mm_sign_epi16 intrinsic in SSSE3 (PSIGNW - see tmmintrin.h), which behaves somewhat differently, so I changed the name for the required function to _mm_sgn_epi16. Using _mm_sign_epi16 might be more efficient when SSSE3 is available however, so you could do something like this:

inline __m128i _mm_sgn_epi16(__m128i v)
{
#ifdef __SSSE3__
    v = _mm_sign_epi16(_mm_set1_epi16(1), v); // use PSIGNW on SSSE3 and later
#else
    v = _mm_min_epi16(v, _mm_set1_epi16(1));  // use PMINSW/PMAXSW on SSE2/SSE3.
    v = _mm_max_epi16(v, _mm_set1_epi16(-1));
#endif
    return v;
}
Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 2
    Note that the two `_mm_set1_epi16`s can be optimized to `_mm_cmpeq_epi16(v, v)` and `_mm_srl_epi16(mm_cmpeq_epi16(v, v), 15)`. This avoids domain stalls and encodes to just 1 or 2 instructions. – Raymond Chen Apr 28 '14 at 13:41
  • 1
    @RaymondChen: true, but any decent compiler will hoist these constants outside the loop. I guess it wouldn't hurt to declare them explicitly prior to the loop though. – Paul R Apr 28 '14 at 14:41
  • 1
    It depends how much register pressure you are under. It could be a win or loss either way - you just have to try it both ways and see. – Raymond Chen Apr 28 '14 at 17:10
  • @RaymondChen: again true, but in this particular case register usage is very low, even for a 32 bit build. – Paul R Apr 29 '14 at 01:51
1

Fill a register of zeros, and compare it with your register, first with "greater than", than with "lower than" (or invert the order of the operands in the "greater than" instruction).
http://msdn.microsoft.com/en-us/library/xd43yfsa%28v=vs.90%29.aspx
http://msdn.microsoft.com/en-us/library/t863edb2%28v=vs.90%29.aspx

The problem at this point is that the true value is represented as 0xffff, which happens to be -1, correct result for the negative number but not for the positive. However, as pointed out by Raymond Chen in the comments, 0x0000 - 0xffff = 0x0001, so it's enough now to subtract the result of "greater than" from the result of "lower than". http://msdn.microsoft.com/en-us/library/y25yya27%28v=vs.90%29.aspx

Of course Paul R answer is preferable, as it uses only 2 instructions.

Community
  • 1
  • 1
Antonio
  • 19,451
  • 13
  • 99
  • 197
0

You can shift all 8 shorts at once using _mm_srai_epi16(tmp, 15) which will return eight 16-bit integers, with each being all ones (i.e. -1) if the input was negative, or all zeros (i.e. 0) if positive.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • The sign function should return 1 for positive values, 0 for zero, and -1 for negative values. http://en.wikipedia.org/wiki/Sign_function – Antonio Apr 25 '14 at 14:07
  • 1
    @Antonio: how do you know that is what the OP wants? This is an honest question: I haven't seen anything so specific from him. – John Zwinck Apr 25 '14 at 14:09
  • Actually having 0 for zero would be the result I require. – Michele Apr 25 '14 at 14:23
  • @Mike: my solution does give 0 for zero. -1 for negative, 0 otherwise. If you need something else, please be more specific. – John Zwinck Apr 25 '14 at 14:25
  • @Mike: please make your question more specific - it's vague about what result you require, and you've ignored comments asking for clarification. – Paul R Apr 25 '14 at 14:36
  • 1
    oh sorry, I misunderstood antonio's comment, and thought he was pointing out a different behaviour than this for all zeros. 1 for positive values, 0 for zero, and -1 for negative values is indeed the behaviour I want. I'll put it in the main question. Thanks – Michele Apr 25 '14 at 14:50
  • @JohnZwinck, I was working on this myself but Paul beat me to it (again). `__m128i s1 = _mm_srai_epi16(v, 15); __m128i s2 = _mm_add_epi16(_mm_set1_epi16(1),s1); __m128i s3 = _mm_add_epi16(s1,s2);` That seemed to do it but I guess Paul's answer is still better. – Z boson Apr 25 '14 at 15:21