Single Precision math slower than Double Precision in FFTW?

Question

I am looking at the benchmarks of FFT library and wondering why double precision math would be faster than that of the single precision (even on a 32-bit hardware).

Where on that page does it show double precision is faster than single precision? — talonmies, Nov 22 '13 at 18:14
Well, you can pick any use case there. Say 1.65 GHz IBM Power5 32-bit mode and compare "double-precision complex, 1d transforms" with "single-precision complex, 1d transforms". You can see that double precision math is slightly better (faster) than it's counter part. — VJ Vélan Solutions, Nov 22 '13 at 18:23
I looked at the Core Duo 64 bit results - 1D single precision complex about 14000 MFlops peak, double precision about 9500 MFlops peak. So what is your question again? — talonmies, Nov 22 '13 at 18:57
Well, i would expect the single precision to have a lesser peak value. Double is 8 bytes (typically) and float is 4 bytes. So, i would expect FFTW to complete sooner on float[] data than on double[] on a 32-bit machine. Am i missing something obvious? Thx. — VJ Vélan Solutions, Nov 22 '13 at 19:16
I see. It's clear after reading the definition http://www.fftw.org/speed/method.html. Thx for pointing it out. — VJ Vélan Solutions, Nov 22 '13 at 19:58

score 1 · Accepted Answer · answered Nov 22 '13 at 21:44

Assuming Intel CPUs - It all depends on the compiler. Compiling for 32 bit applications , you can use normal i87 floating point where single and double precision are the same speed. Or you can select SSE for SP and SSE2 for DP, where SSE (4 words in registers) is twice as fast as SSE2 (2 words per register). Compiling for 64 bits, i87 instructions are not available. Then floating point is always compiled to use SSE/SSE2. Either due to the compiler or the particular program, these can be compiled as SIMD (Single Instruction Multiple Data - 4/2 words at a time) or SISD (Single Data using one word per register). Then, I suppose, SP and DP will be of a similar speed and the code can be slower than 32 bit compilations.

Using data from RAM, and possibly cache, performance can be limited by bus speed, where SP will be faster than DP. If the code is like my FFT benchmarks, it depends on skipped sequential reading and writing. Then speed is affected by data being read in bursts of at least 64 bytes, where SP is likely to be a little faster.

Such as trig functions are often calculated in DP. Then SP is a bit slower due to DP to SP conversion.

I don't think your comment is true that i87 instructions are not available in 64 bit mode. The compiler might not use them but that does not mean they can't be used. I think GCC may still use them but MSVC does not in 64 bit. Also if you're only using SSE2 for a single float or double then most of the operations (add, sub, mul, ...) are the same speed just like i87. It's only slower for some math operations like sqrt. — Z boson, Dec 03 '13 at 07:39

Single Precision math slower than Double Precision in FFTW?

1 Answers1