Get index of first element that is not zero in a __m256 variable

Question

__m256  dst = _mm256_cmp_ps(value1, value2, _CMP_LE_OQ);

If dst is [0,0,0,-nan, 0,0,0,-nan]; I want to be able to know the first -nan index, in this case 3 without doing a for loop with 8 iterations. Is this possible?

Think of the compare-result as integers or masks, even though it's in a `__m256`. It's 0 or all-ones, which is 2's complement -1 = unsigned 0xFFFFFFFF. I mean yes if interpreted as IEEE binary32, they are the bit patterns for 0 or -NaN, but that's rarely how you want to use them. — Peter Cordes, Apr 01 '19 at 02:05

score 9 · Accepted Answer · answered Mar 31 '19 at 10:20

9

I would movmskps the result of the comparison and then do a bitscan forward.

Using intrinsics (this works with gcc/clang, see here for alternatives):

int pos = __builtin_ctz(_mm256_movemask_ps(dst));

Note that the result of bsf is unspecified if no bit is set. To work around this you can, e.g., write this to get 8, if no other bit is set:

int pos = __builtin_ctz(_mm256_movemask_ps(dst) | 0x100);

answered Mar 31 '19 at 10:20

chtz

17,329
4
26
56

2

Note that on newer CPUs, such as, e.g., Intel Haswell or newer, you can use `_tzcnt_u32()` instead of `__builtin_ctz()`. Intrinsic `_tzcnt_u32()` can be used with all major compilers (gcc, icc, clang, MSVC). It maps to the `tzcnt` instruction, which is well defined for zero inputs too. – wim Apr 05 '19 at 18:28

Get index of first element that is not zero in a __m256 variable

1 Answers1