CUDA __float_as_int in acosf implementation

Question

CUDA C's maths function implementation (cuda/math_function.h) of acosf contains the passage:

if (__float_as_int(a) < 0) {
  t1 = CUDART_PI_F - t1;
}

where a and t1 are floats and CUDART_PI_F is a float previously set to a numerical value close to the mathematical constant Pi. I am trying to understand what the conditional (if-clause) is testing for and what would be the C equivalent of it or the function/macro __float_as_int(a). I searched for the implementation of __float_as_int() but without success. It seems that __float_as_int() is a built-in macro or function to NVIDIA NVCC. Looking at the PTX that NVCC produces out of the above passage:

    .reg .u32 %r<4>;
    .reg .f32 %f<46>;
    .reg .pred %p<4>;
    // ...
    mov.b32         %r1, %f1;
    mov.s32         %r2, 0;
    setp.lt.s32     %p2, %r1, %r2;
    selp.f32        %f44, %f43, %f41, %p2;

it becomes clear that __float_as_int() is not a float to int rounding. (This would have yielded a cvt.s32.f32.) Instead it assigns the float %f1 as a bit-copy (b32) to %r1 (notice: %r1 is of type u32 (unsigned int)!!) and then compares %r1 as if it was a s32 (signed int, confusing!!) with %r2 (who's value is 0).

To me this looks a little odd. But obviously it is correct.

Can someone explain what's going on and especially explain what __float_as_int() is doing in the context of the if-clause testing for being negative (<0)? .. and provide a C equivalent of the if-clause and/or __float_as_int() marco ?

score 4 · Accepted Answer · answered Dec 10 '12 at 13:48

4

__float_as_int reinterprets float as an int. int is <0 when it has most significant bit on. For float it also means that the sign bit is on, but it does not exactly mean that number is negative (e.g. it can be 'negative zero'). It can be faster to check then checking if float is < 0.0.

C function could look like:

int __float_as_int(float in) {
     union fi { int i; float f; } conv;
     conv.f = in;
     return conv.i;
}

In some other version of this header __cuda___signbitf is used instead.

answered Dec 10 '12 at 13:48

zch

14,931
2
41
49

Interesting! So it's for performance reasons implemented like this. – ritter Dec 10 '12 at 14:06
Single-precision floating-point comparison is no slower than an integer comparison on the GPU. The earlier use of signbitf() in this if-statement required a bit extraction plus a comparison, however. – njuffa Dec 10 '12 at 18:26

CUDA __float_as_int in acosf implementation

1 Answers1