What is the reason for the catastrophic performance of pow()
for NaN values? As far as I can work out, NaNs should not have an impact on performance if the floating-point math is done with SSE instead of the x87 FPU.
This seems to be true for elementary operations, but not for pow()
. I compared multiplication and division of a double to squaring and then taking the square root. If I compile the piece of code below with g++ -lrt
, I get the following result:
multTime(3.14159): 20.1328ms
multTime(nan): 244.173ms
powTime(3.14159): 92.0235ms
powTime(nan): 1322.33ms
As expected, calculations involving NaN take considerably longer. Compiling with g++ -lrt -msse2 -mfpmath=sse
however results in the following times:
multTime(3.14159): 22.0213ms
multTime(nan): 13.066ms
powTime(3.14159): 97.7823ms
powTime(nan): 1211.27ms
The multiplication / division of NaN is now much faster (actually faster than with a real number), but the squaring and taking the square root still takes a very long time.
Test code (compiled with gcc 4.1.2 on 32bit OpenSuSE 10.2 in VMWare, CPU is a Core i7-2620M)
#include <iostream>
#include <sys/time.h>
#include <cmath>
void multTime( double d )
{
struct timespec startTime, endTime;
double durationNanoseconds;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startTime);
for(int i=0; i<1000000; i++)
{
d = 2*d;
d = 0.5*d;
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endTime);
durationNanoseconds = 1e9*(endTime.tv_sec - startTime.tv_sec) + (endTime.tv_nsec - startTime.tv_nsec);
std::cout << "multTime(" << d << "): " << durationNanoseconds/1e6 << "ms" << std::endl;
}
void powTime( double d )
{
struct timespec startTime, endTime;
double durationNanoseconds;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startTime);
for(int i=0; i<1000000; i++)
{
d = pow(d,2);
d = pow(d,0.5);
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endTime);
durationNanoseconds = 1e9*(endTime.tv_sec - startTime.tv_sec) + (endTime.tv_nsec - startTime.tv_nsec);
std::cout << "powTime(" << d << "): " << durationNanoseconds/1e6 << "ms" << std::endl;
}
int main()
{
multTime(3.14159);
multTime(NAN);
powTime(3.14159);
powTime(NAN);
}
Edit:
Unfortunately, my knowledge on this topic is extremely limited, but I guess that the glibc pow()
never uses SSE on a 32bit system, but rather some assembly in sysdeps/i386/fpu/e_pow.S
. There is a function __ieee754_pow_sse2
in more recent glibc versions, but it's in sysdeps/x86_64/fpu/multiarch/e_pow.c
and therefore probably only works on x64. However, all of this might be irrelevant here, because pow()
is also a gcc built-in function. For an easy fix, see Z boson's answer.