It's unclear whether you want the result of this in an XMM register in preparation for some masking, or in a GPR register in preparation for, say, branching.
Alternative 1
This may be a more flexible alternative because it leaves a mask in an XMM register, and from there to the GPRs is just a PMOVMSKB away. It does however cost two 128-bit constants.
This is the simple approach: Compare for > -1 aka >= 0 on the top and give an impossible comparison on the bottom, then compare for < 1 aka <= 0 on the bottom and give an impossible comparison on the top. Logic-OR them together and you have your mask. If all bits are set, all the integers met their condition, so the test is true, else it's false.
__m128i result;
/* ... */
__m128i TOP = _mm_set_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF);
__m128i BOT = _mm_set_epi32(0x80000000, 0x80000000, 0x00000001, 0x00000001);
__m128i cmpT = _mm_cmpgt_epi32(result, TOP);//Top > -1 Bottom > INT_MAX
__m128i cmpB = _mm_cmpgt_epi32(BOT, result);//Bottom < 1, Top < INT_MIN
__m128i cmp = _mm_or_si128(cmpT, cmpB);
int cond = _mm_movemask_epi8(cmp) == 0xFFFF;
/* cond contains the result of the comparison:
0 if check failed and
1 if check satisfied. */
Alternative 2
I've exploited PMOVMSKB on both the original value and its PSUBD negation, then checked the right bits of both returned bitmasks for the right value.
__m128i result;
/* ... */
__m128i ZERO = _mm_setzero_si128(); /* 0 constant */
__m128i neg = _mm_sub_epi32(ZERO, result); /* Negate */
int lt0 = _mm_movemask_epi8(result); /* < 0 ? */
int gt0 = _mm_movemask_epi8(neg); /* > 0 ? */
gt0 &= ~lt0; /* Correction for INT_MIN. Can be
deleted if never encountered. */
int cond = !((gt0 | (lt0 >> 8)) & 0x88); /* Check both bits 3 and 7 are 0 */
/* cond contains the result of the comparison:
0 if check failed and
1 if check satisfied. */
My explanation:
- I negate the integers.
- I extract the sign bits,
lt0
, from the integers. They represent the condition result[i] < 0
.
- I extract the sign bits,
gt0
, from the negations. They represent the condition result[i] > 0
with the exception of if result[i]
was INT_MIN
.
- Optional: I correct that case by detecting it and correcting it (
gt0 &= ~lt0
sets to 0 any false reports that -2147483648 is > 0).
- I then check whether all of the following holds:
- Bit 3 of
gt0
is 0. Implies result[0] <= 0
.
- Bit 7 of
gt0
is 0. Implies result[1] <= 0
.
- Bit 11 of
lt0
is 0. Implies result[2] >= 0
.
- Bit 15 of
lt0
is 0. Implies result[3] >= 0
.
There is a reason why we look at bits 3, 7, 11 and 15, and a reason why we use the magic 8 and 0x88 constants. It is that PMOVMSKB returns one sign bit per byte, and not one sign bit per dword, so the bits we are actually interested in are surrounded with junk bits that we must ignore, with only the sign bit of the top byte of each integer interesting us.
In total this makes 9-10 instructions to run the check.