15
__m256  dst = _mm256_cmp_ps(value1, value2, _CMP_LE_OQ);

If dst is [0,0,0,-nan, 0,0,0,-nan]; I want to be able to know the first -nan index, in this case 3 without doing a for loop with 8 iterations. Is this possible?

hidayat
  • 9,493
  • 13
  • 51
  • 66
  • 1
    Think of the compare-result as integers or masks, even though it's in a `__m256`. It's 0 or all-ones, which is 2's complement -1 = unsigned 0xFFFFFFFF. I mean yes if interpreted as IEEE binary32, they are the bit patterns for 0 or -NaN, but that's rarely how you want to use them. – Peter Cordes Apr 01 '19 at 02:05

1 Answers1

9

I would movmskps the result of the comparison and then do a bitscan forward.

Using intrinsics (this works with gcc/clang, see here for alternatives):

int pos = __builtin_ctz(_mm256_movemask_ps(dst));

Note that the result of bsf is unspecified if no bit is set. To work around this you can, e.g., write this to get 8, if no other bit is set:

int pos = __builtin_ctz(_mm256_movemask_ps(dst) | 0x100);
chtz
  • 17,329
  • 4
  • 26
  • 56
  • 2
    Note that on newer CPUs, such as, e.g., Intel Haswell or newer, you can use `_tzcnt_u32()` instead of `__builtin_ctz()`. Intrinsic `_tzcnt_u32()` can be used with all major compilers (gcc, icc, clang, MSVC). It maps to the `tzcnt` instruction, which is well defined for zero inputs too. – wim Apr 05 '19 at 18:28