6

When compiling

double isnan(double x){
   return x!=x
}

both clang and gcc utilize the parity-flag PF:

_Z6is_nand: # @_Z6is_nand
  ucomisd %xmm0, %xmm0
  setp %al
  retq

However, the two possible outcomes of the comparison are:

      NaN     Not-Nan
ZF     1        1
PF     1        0
CF     1        0

that means it would be also possible to use the CF-flag as alternative, i.e. setb instead of setp.

Are there any advantages of using setp over setb, or is it a coincidence, that both compilers use the parity flag?

PS: This question is the following up to Understanding compilation result for std::isnan

ead
  • 32,758
  • 6
  • 90
  • 153

1 Answers1

9

The advantage is that the compiler emits this code naturally without needing a special case to recognize x!=x and transform it into !(x >= x).

Without -ffast-math, x != y has to check PF to see if the comparison is ordered, then check ZF for equality. In special case where both inputs are the same, presumably normal optimization mechanisms like CSE can get rid of the ZF check, leaving only PF.

In this case, setb wouldn't be worse, but it has absolutely no advantage, and it's more confusing for humans, and it probably needs more special-case code for the compiler to emit it.

Your suggested transformation would only be useful when using the result with special instruction that use CF, like adc. For example, nan_counter += arr[i] != arr[i]. That auto-vectorizes trivially (cmp_unord_ps / psubd), but scalar cleanup (or a scalar use-case over non-array inputs) could use ucomiss / adc $0, %eax instead of ucomiss / setp / add.

That saves an instruction, and a uop on Broadwell and later, and on AMD. (Earlier Intel CPUs have 2 uop adc, unless they special-case $0, because they don't support 3-input uops)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Actually, neither gcc nor clang come up with your cool trick for `nan_counter`: https://godbolt.org/g/jWtSeo The upper bits of `%ead` need to be cleared as well, that means an additional `xorl %ead, %ead` for the setp-solution. – ead Jul 17 '18 at 06:34
  • @ead: missed optimizations are unfortunately common. Thanks for checking on that; I suspected compilers would fail to find it. The point of my answer is that *that's* what you should be complaining about, not `setp` instead of `setb`, not that there's nothing to complain about :P – Peter Cordes Jul 17 '18 at 06:40
  • Re: xor-zeroing: I left that out because you can hoist it out of a scalar loop and repeatedly set the low byte of the same otherwise-zeroed reg. But yeah, often you will need the extra instruction if there's more work in the loop, and gcc especially likes to break dependencies so would probably not hoist xor-zeroing even if it could. At least the zero-extension isn't on the critical path unless gcc gives up and falls back to `setp` / `movzx` like gcc used to usually use. (Hmm, even old uses xor/ucomisd/setp for FP. gcc8.1 -m32 uses `cmp/setne/movzbl` for integer https://godbolt.org/g/oEGWnT) – Peter Cordes Jul 17 '18 at 06:43