3

I was trying to figure out if using floats in some code in C would be precise enough for my needs, but after searching and not really understanding how bits of precision translated to actual numbers, I decided just to write a bit of code for my test case and see what the results were.

Floats seem precise enough, but I quite surprised that floats were taking about 70% longer to run on my 17 4700hq haswell processor (windows 8.1 x64, C, MSVS v120). I would have expected the running time to be similar or floats performing faster. But clearly not. So I turned off all optimizations, still the same. Tried it on the debug version, and still the same performance issues. AVX2 and SSE 3, all were showing this.

Doubles take about 197 seconds to run and floats 343 seconds.

I've glanced through the Intel® 64 and IA-32 Architectures Software Developer’s Manual, but considering its size and my lack of expertise, I've yet to glean any answers from it concerning this. Then I took a look at the disassembly of both, but I didn't notice any glaring differences to my untrained eyes.

So, anyone know why this is the case? Here's the code I've used, with the only changes being from doubles to floats for all but the anError variable.

#include <errno.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>
#include <sys/types.h>
#include <omp.h>



int main( void ) {

    clock_t start = clock() ;

    // body of a candle
    double open = 0.500000 ;
    double close = 0.500001 ;
    double end = 1 ;

    uint64_t resultCounter = 0 ;
    double anError = 0 ;

    while (open < end){
        while(close < end){
            //calc # times result is postive. Should be 0.
            double res = open - close ;
            if (res > 0 ) { 
                resultCounter++ ; 
                if (anError < fabs( res )) { anError = res ;    }
            }
            close = close + 0.000001 ;
        }
        open = open + 0.000001 ;
        close = open + .000001 ;
    }

    clock_t finish = clock() ;
    double duration = ((double) (finish - start)) / CLOCKS_PER_SEC;
    double iterations = (((end - .50000) / .000001) * ((end - .50000) / .000001)) ;
    fprintf( stdout, "\nTotal processing time was %f seconds.\n", duration ) ;
    fprintf( stdout, "Error is %f. Number of times results were incorrect %llu out of %f iterations.\n", 
        anError, resultCounter, iterations ) ;

    return 0  ;
}

EDIT: The lack of f at the end of the numbers seems to be the cause (thanks Joachim!). Apparently a float constant without the f suffix is actually a double! Another of C's quirks that likes to bite the ignorant in the butt. Not sure what the rationale behind this oddity is but shrug. If anyone wants to write up a good answer to this so I can accept it, feel free.

Jason White
  • 666
  • 4
  • 10
  • 23
  • 7
    In the program using `float` instead, do you remember to use e.g. `0.000001f` instead of just plain `0.000001`? Or using [`fabsf`](http://en.cppreference.com/w/c/numeric/math/fabs) instead of `fabs`? Otherwise the compiler will have to add code to convert numbers between `float` and `double`. – Some programmer dude Sep 04 '15 at 20:29
  • Also, for a fairer test you should really run the calculation loop multiple times, and get an average. – Some programmer dude Sep 04 '15 at 20:30
  • Hmm. If after elimination of the implicit conversions (using f*() math functions) there are any differences left, the results will heavily depend on compiler, CPU and even if you are compiling 32bit or 64bit. On x86 with SSE support, the SSE registers are 128bit wide, so you get the conversion either way. The 8 hardware floating point registers (now probably emulated with SSE) before that were 80 bit wide, so there's no "fit" there either. – dhke Sep 04 '15 at 20:45
  • 1
    Post the `float` version too. – chux - Reinstate Monica Sep 04 '15 at 20:47
  • MSVC has a bad habit of promoting intermediates to `double` when you use `fp:precise`. Try `fp:fast` instead. – Mysticial Sep 04 '15 at 20:47
  • So, why is float number = 1.0233 essentially a double if you don't type it out as 1.0233f? Shouldn't float be enough? – Jason White Sep 04 '15 at 21:11
  • 2
    @dhke: If you believe the 128-bit SSE are native quad-precision, I am not sure whether Intel's marketing has failed horribly or succeeded magnificently. – EOF Sep 04 '15 at 21:27
  • because any constant number will be converted by the compiler to some floating-point number that is as close as possible to it. It is not guaranteed that the converted number and the original number are exactly equal; there will most likely be some difference or error. Converting to double (instead of float) minimizes this error. That is why compilers convert to double, in order to translate the code as reliably as possible – A.S.H Sep 04 '15 at 21:28
  • The problem is probably not in assigning constants, but performing arithmetic on them. If either operand is a double then the float will be converted to double, and the result will convert back to float. When you assign a variable to a constant, the conversion will generally be done at compile time rather than run time. – Mark Ransom Sep 04 '15 at 21:51
  • @JasonWhite -- Updating the question with the answer is considered bad style -- Instead of EDITing in the answer to be part of the question, you should post a proper answer and accept your own answer -- that is if Joachim does not expect to post an answer and just leave the comment – Soren Sep 04 '15 at 21:54
  • @EOF: The registers are 128bit wide, which facilitates a conversion I haven't claimed a thing about actual numeric precision during calculation (which is different, since you stuff 2 doubles into xmm?). Please don't interpret assertions that aren't there. – dhke Sep 04 '15 at 22:04
  • @dhke: `Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B & 2C): Instruction Set Reference, A-Z Page 3-574 MOVSS—Move Scalar Single-Precision Floating-Point Values`. You can use single-precision floats in SSE, no conversion needed. – EOF Sep 05 '15 at 08:15
  • @EOF Maybe *conversion* did ring the wrong bell, because you are *still* interpreting a statement I did not make. The SSE fill and spill instructions are different for single and double precision, and you can use different ones (floats can even be quadloaded if properly aligned). And so it depends what the code is compiled to. – dhke Sep 06 '15 at 15:36

1 Answers1

0

According to the C standard :

An unsuffixed floating constant has type double. If suffix is the letter f or F, the floating constant has type float. If suffix is the letter l or L, the floating constant has type long double

More details about floating point constants here. So :

num is just a float

float num = 1.0f;
float num = 1.0F;

a double gets converted to float and stored in num

float num = 1.0;

a float gets converted to double and stored in num

double num = 1.0f;
double num = 1.0F;

The performance is worse when using floats due to the conversion of the constant from double to float which involves copying memory.

Manos Nikolaidis
  • 21,608
  • 12
  • 74
  • 82