SSE intrinsic over int16[8] to extract the sign of each element

Question

I'm working with SSE intrinsic functions. I have an __m128i representing an array of 8 signed short (16 bit) values.

Is there a function to get the sign of each element?

EDIT1: something that can be used like this:

short tmpVec[8];
__m128i tmp, sgn;

for (i-0;i<8;i++)
    tmp.m128i_i16[i] = tmpVec[i]

sgn = _mm_sign_epi16(tmp);

of course "_mm_sign_epi16" doesn't exist, so that's what I'm looking for.

How slow it is to do it element by element?

EDIT2: desired behaviour: 1 for positive values, 0 for zero, and -1 for negative values.

thanks

In what format do you wish to receive the sign? Please post an example. — Sergey L., Apr 25 '14 at 12:53
[Intel Intrinsics Guide](https://software.intel.com/sites/landingpage/IntrinsicsGuide/) — Sean Bright, Apr 25 '14 at 12:53
Depend on how you use the value, use a comparison with 0, a right shift by 15 or `and` each element with 0x8000 — phuclv, Apr 25 '14 at 12:59
Do you want the sign result for each element to be +1/0/-1, or +1/-1, or 1/0, or what ? — Paul R, Apr 25 '14 at 13:48
If you only want the sign bit, you can use `_mm_movemask_epi8(_mm_packs_epi16(tmp, _mm_setzero_si128()))`. In words: Pack into 8-bit values via signed saturation, setting the upper 64 bits to zero. This preserves sign. Then extract the 16 sign bits. Since the upper 64 bits are zero, the upper 8 sign bits will be zero. — Raymond Chen, Apr 25 '14 at 14:01
I'll give it a try, but this won't give me 0 if the an element is zero, won't it? — Michele, Apr 25 '14 at 14:23
The sign bit is 1 if the value is negative, and it is 0 if the value is zero or positive. I see that you clarified in your edit that you want -1/0/1, in which case `_mm_movemask_epi8` will not help you. — Raymond Chen, Apr 25 '14 at 21:36
@PaulR, How would you do it for Floating Point numbers when limited to SSE3? Thank You. — Royi, Dec 24 '16 at 12:29
@Royi: it should be pretty simple, but post a new question with an [sse] tag, and I'll do my best to come up wih a solution (as will others no doubt). Be sure to specify excpactly what outputs you require for different input cases. — Paul R, Dec 24 '16 at 12:35

Paul R · Accepted Answer · 2014-04-25T14:54:41.530

14

You can use min/max operations to get the desired result, e.g.

inline __m128i _mm_sgn_epi16(__m128i v)
{
    v = _mm_min_epi16(v, _mm_set1_epi16(1));
    v = _mm_max_epi16(v, _mm_set1_epi16(-1));
    return v;
}

This is probably a little more efficient than explicitly comparing with zero + shifting + combining results.

Note that there is already an _mm_sign_epi16 intrinsic in SSSE3 (PSIGNW - see tmmintrin.h), which behaves somewhat differently, so I changed the name for the required function to _mm_sgn_epi16. Using _mm_sign_epi16 might be more efficient when SSSE3 is available however, so you could do something like this:

inline __m128i _mm_sgn_epi16(__m128i v)
{
#ifdef __SSSE3__
    v = _mm_sign_epi16(_mm_set1_epi16(1), v); // use PSIGNW on SSSE3 and later
#else
    v = _mm_min_epi16(v, _mm_set1_epi16(1));  // use PMINSW/PMAXSW on SSE2/SSE3.
    v = _mm_max_epi16(v, _mm_set1_epi16(-1));
#endif
    return v;
}

edited Apr 25 '14 at 14:54

answered Apr 25 '14 at 14:40

Paul R

208,748
37
389
560

2

Note that the two `_mm_set1_epi16`s can be optimized to `_mm_cmpeq_epi16(v, v)` and `_mm_srl_epi16(mm_cmpeq_epi16(v, v), 15)`. This avoids domain stalls and encodes to just 1 or 2 instructions. – Raymond Chen Apr 28 '14 at 13:41
1

@RaymondChen: true, but any decent compiler will hoist these constants outside the loop. I guess it wouldn't hurt to declare them explicitly prior to the loop though. – Paul R Apr 28 '14 at 14:41
1

It depends how much register pressure you are under. It could be a win or loss either way - you just have to try it both ways and see. – Raymond Chen Apr 28 '14 at 17:10
@RaymondChen: again true, but in this particular case register usage is very low, even for a 32 bit build. – Paul R Apr 29 '14 at 01:51

score 1 · Answer 2 · edited May 23 '17 at 12:08

1

Fill a register of zeros, and compare it with your register, first with "greater than", than with "lower than" (or invert the order of the operands in the "greater than" instruction).
http://msdn.microsoft.com/en-us/library/xd43yfsa%28v=vs.90%29.aspx
http://msdn.microsoft.com/en-us/library/t863edb2%28v=vs.90%29.aspx

The problem at this point is that the true value is represented as 0xffff, which happens to be -1, correct result for the negative number but not for the positive. However, as pointed out by Raymond Chen in the comments, 0x0000 - 0xffff = 0x0001, so it's enough now to subtract the result of "greater than" from the result of "lower than". http://msdn.microsoft.com/en-us/library/y25yya27%28v=vs.90%29.aspx

Of course Paul R answer is preferable, as it uses only 2 instructions.

edited May 23 '17 at 12:08

Community

1
1

answered Apr 25 '14 at 13:55

Antonio

19,451
13
99
197

You don't need to shift. Just subtract in the opposite order. – Raymond Chen Apr 25 '14 at 14:03
If you do a signed subtraction, then there is no overflow. 0 - (-1) = 1. – Raymond Chen Apr 25 '14 at 16:38
I mean that you can do `_mm_sub_epi16(_mm_cmpgt_epi16(_mm_setzero_si128(), value), _mm_cmpgt_epi16(value, _mm_setzero_si128())))`. The point is that 0x0000 - 0xFFFF = 0x0001, so you don't need to shift at all. – Raymond Chen Apr 28 '14 at 13:36

score 0 · Answer 3 · answered Apr 25 '14 at 14:06

0

You can shift all 8 shorts at once using _mm_srai_epi16(tmp, 15) which will return eight 16-bit integers, with each being all ones (i.e. -1) if the input was negative, or all zeros (i.e. 0) if positive.

answered Apr 25 '14 at 14:06

John Zwinck

239,568
38
324
436

The sign function should return 1 for positive values, 0 for zero, and -1 for negative values. http://en.wikipedia.org/wiki/Sign_function – Antonio Apr 25 '14 at 14:07
1

@Antonio: how do you know that is what the OP wants? This is an honest question: I haven't seen anything so specific from him. – John Zwinck Apr 25 '14 at 14:09
Actually having 0 for zero would be the result I require. – Michele Apr 25 '14 at 14:23
@Mike: my solution does give 0 for zero. -1 for negative, 0 otherwise. If you need something else, please be more specific. – John Zwinck Apr 25 '14 at 14:25
@Mike: please make your question more specific - it's vague about what result you require, and you've ignored comments asking for clarification. – Paul R Apr 25 '14 at 14:36
1

oh sorry, I misunderstood antonio's comment, and thought he was pointing out a different behaviour than this for all zeros. 1 for positive values, 0 for zero, and -1 for negative values is indeed the behaviour I want. I'll put it in the main question. Thanks – Michele Apr 25 '14 at 14:50
@JohnZwinck, I was working on this myself but Paul beat me to it (again). `__m128i s1 = _mm_srai_epi16(v, 15); __m128i s2 = _mm_add_epi16(_mm_set1_epi16(1),s1); __m128i s3 = _mm_add_epi16(s1,s2);` That seemed to do it but I guess Paul's answer is still better. – Z boson Apr 25 '14 at 15:21

SSE intrinsic over int16[8] to extract the sign of each element

3 Answers3