How would one create a mask using SSE intrinsics which indicates whether the signs of two packed floats (__m128's) are the same for example if comparing a and b where a is [1.0 -1.0 0.0 2.0] and b is [1.0 1.0 1.0 1.0] the desired mask we would get is [true false true true].
2 Answers
Here's one solution:
const __m128i MASK = _mm_set1_epi32(0xffffffff);
__m128 a = _mm_setr_ps(1,-1,0,2);
__m128 b = _mm_setr_ps(1,1,1,1);
__m128 f = _mm_xor_ps(a,b);
__m128i i = _mm_castps_si128(f);
i = _mm_srai_epi32(i,31);
i = _mm_xor_si128(i,MASK);
f = _mm_castsi128_ps(i);
// i = (0xffffffff, 0, 0xffffffff, 0xffffffff)
// f = (0xffffffff, 0, 0xffffffff, 0xffffffff)
In this snippet, both i
and f
will have the same bitmask. I assume you want it in the __m128
type so I added the f = _mm_castsi128_ps(i);
to convert it back from an __m128i
.
Note that this code is sensitive to the sign of the zero. So 0.0
and -0.0
will affect the results.
Explanations:
The way the code works is as follows:
f = _mm_xor_ps(a,b); // xor the sign bits (well all the bits actually)
i = _mm_castps_si128(f); // Convert it to an integer. There's no instruction here.
i = _mm_srai_epi32(i,31); // Arithmetic shift that sign bit into all the bits.
i = _mm_xor_si128(i,MASK); // Invert all the bits
f = _mm_castsi128_ps(i); // Convert back. Again, there's no instruction here.

- 464,885
- 45
- 335
- 332
-
@Mystical, How would you do a Sign Function on float using this? – Royi Dec 24 '16 at 12:31
-
@Royi That's better as a separate question. – Mysticial Dec 25 '16 at 02:35
Have a look at the _mm_movemask_ps
instruction, which extracts the most significant bit (i.e. sign bit) from 4 floats. See http://msdn.microsoft.com/en-us/library/4490ys29.aspx
For example, if you have [1.0 -1.0 0.0 2.0], then movemask_ps will return 4, or 0100 in binary. So then if you get movemask_ps for each vector and compare the results (perhaps bitwise NOT XOR), then that will indicate whether all the signs are the same.
a = [1.0 -1.0 0.0 2.0]
b = [1.0 1.0 1.0 1.0]
movemask_ps a = 4
movemask_ps b = 0
NOT (a XOR b) = 0xB, or binary 1011
Hence signs are the same except in the second vector element.

- 2,861
- 1
- 21
- 30