2

How would one create a mask using SSE intrinsics which indicates whether the signs of two packed floats (__m128's) are the same for example if comparing a and b where a is [1.0 -1.0 0.0 2.0] and b is [1.0 1.0 1.0 1.0] the desired mask we would get is [true false true true].

cubiclewar
  • 1,569
  • 3
  • 20
  • 37

2 Answers2

5

Here's one solution:

const __m128i MASK = _mm_set1_epi32(0xffffffff);

__m128 a = _mm_setr_ps(1,-1,0,2);
__m128 b = _mm_setr_ps(1,1,1,1);

__m128  f = _mm_xor_ps(a,b);
__m128i i = _mm_castps_si128(f);

i = _mm_srai_epi32(i,31);
i = _mm_xor_si128(i,MASK);

f = _mm_castsi128_ps(i);

//  i = (0xffffffff, 0, 0xffffffff, 0xffffffff)
//  f = (0xffffffff, 0, 0xffffffff, 0xffffffff)

In this snippet, both i and f will have the same bitmask. I assume you want it in the __m128 type so I added the f = _mm_castsi128_ps(i); to convert it back from an __m128i.

Note that this code is sensitive to the sign of the zero. So 0.0 and -0.0 will affect the results.


Explanations:

The way the code works is as follows:

f = _mm_xor_ps(a,b);       //  xor the sign bits (well all the bits actually)

i = _mm_castps_si128(f);   //  Convert it to an integer. There's no instruction here.

i = _mm_srai_epi32(i,31);  //  Arithmetic shift that sign bit into all the bits.

i = _mm_xor_si128(i,MASK); //  Invert all the bits

f = _mm_castsi128_ps(i);   //  Convert back. Again, there's no instruction here.
Mysticial
  • 464,885
  • 45
  • 335
  • 332
2

Have a look at the _mm_movemask_ps instruction, which extracts the most significant bit (i.e. sign bit) from 4 floats. See http://msdn.microsoft.com/en-us/library/4490ys29.aspx

For example, if you have [1.0 -1.0 0.0 2.0], then movemask_ps will return 4, or 0100 in binary. So then if you get movemask_ps for each vector and compare the results (perhaps bitwise NOT XOR), then that will indicate whether all the signs are the same.

a = [1.0 -1.0 0.0 2.0]
b = [1.0 1.0 1.0 1.0]
movemask_ps a = 4
movemask_ps b = 0
NOT (a XOR b) = 0xB, or binary 1011

Hence signs are the same except in the second vector element.

Gnat
  • 2,861
  • 1
  • 21
  • 30