How do I test if a __m128i
variable has any nonzero value on SSE-2-and-earlier processors?
Asked
Active
Viewed 4,917 times
12

user541686
- 205,094
- 128
- 528
- 886
-
Do you mean a non-zero bit, or an 8 / 16 / 32-bit integer element? – Brett Hale Nov 03 '11 at 03:44
-
@BrettHale: I'm testing to see if they're all zero. – user541686 Nov 03 '11 at 03:46
2 Answers
12
In SSE2 you can do:
__m128i zero = _mm_setzero_si128();
if(_mm_movemask_epi8(_mm_cmpeq_epi32(x,zero)) == 0xFFFF)
{
//the code...
}
this will test four int's vs zero then return a mask for each byte, so your bit-offsets of each corresponding int
would be at 0, 4, 8 & 12, but the above test will catch if any bit is set, then if you preserve the mask you can work with the finer grained parts directly if need be.

Necrolis
- 25,836
- 3
- 63
- 101
-
2+1, it's better than mine. :) I've never used the movemask instruction so I didn't know you could do that. XD – Mysticial Nov 03 '11 at 06:50
-
3There's a bug in the otherwise excellent answer - if you're checking for all zeros, it should be `if(_mm_movemask_epi8(_mm_cmpeq_epi32(x,zero)) == 0xFFFF)`. This is because `_mm_cmpeq_epi32` sets the int to all 1's, not all 0's, if it's equal to zero, and then the `_mm_movemask_epi8` sets first 16 bits based the most significant bit of each byte in the argument. Hopefully the author can edit the answer - I tried but was rejected. – FarmerBob May 28 '15 at 23:57
-
@LeonidTsybert: re-read the original question, the OP wanted code to run on any *non-zero* value, ie. when the vector contains elements **not equal** to zero. My code tests that all for values `!= 0`, and my comments about the masking allow for individually checking each packed value – Necrolis May 29 '15 at 14:20
-
2I read the original question differently from you. Your code does what you say it does, that is it checks if all four 32-bit values are non-zero. I interpret the question as if "any" value is non-zero, as stated in the body of the question, or conversely if they are all zero, as is in the title of the question and OP's clarification to Brett Hale. If that's what is needed (and it's what I needed for my project which lead me to find this question), then you need to test against 0xFFFF. – FarmerBob Jun 02 '15 at 07:08
-
1@LeonidTsybert: I can updated the mask, but TBH if you can't read the comments as to what the code does, you shouldn't be touching SIMD intrinstics... – Necrolis Jun 08 '15 at 21:27
-
5
For the sake of completeness, with SSE4 one can use _mm_testz_si128.
const bool isAllZero = _mm_testz_si128(a,a);
Note that this is true when all bits are zero.

Antonio
- 19,451
- 13
- 99
- 197
-
2This is actually slightly faster, and doesn't need an all-zero register to test against. `ptest` / `jz` is 2 + 1 uop (doesn't macro-fuse). `pcmpeq`(1uop) / `pmovmsk`(1uop) / `and 0xffff` (1uop) / `cmp 0xffff/je` (1uop). If you were testing the other case (*any* zero elements, rather than *all* zero elements), they'd be approx the same performance on current Intel and AMD CPUs: `ptest`/`jnz` (3 uops) vs. `pcmpeq` / `pmovmsk` / `test/jnz` (3 uops). – Peter Cordes Mar 09 '16 at 16:11
-
@PeterCordes What about, in that case, having a register set at all ones, and using `_mm_testc_si128`? Something like `const bool atLeastOneZero = _mm_testc_si128(a,allOnes)` – Antonio Mar 09 '16 at 17:52
-
1Again, `ptest` is slightly faster. To do it without `ptest`, you'd `pcmpeq` against the all-ones vector, and then proceed with exactly the same sequence for checking that all elements matched. Checking for all-zero or all-one with `pcmpeq` is the same as for checking for == to any other pattern, except that the constants are easier to generate on the fly (`pxor same,same` or `pcmpeqw same,same`). – Peter Cordes Mar 09 '16 at 18:14