0

[I tried to compute a float multiplication, I observed the value was getting saturated to 65536 and was not updating.

the issue is only with the below code.]1

Result for the above code

I tried this with online GCC compiler the issue was still the same.

does this have anything to do with float precision ? is compiler optimizing my float precision during operation?

is there any compiler flags that I can add to overcome this issue?

can anyone please guide me on how to solve this issue?

Attaching the code for reference

#include <stdio.h>

int main()
{
    float dummy1, dummy2;
 unsigned int i =0;
    
    printf("Hello World");
    printf("size of float = %ld\n", sizeof(dummy1));
    
    dummy2 = 0.0;
    dummy1 =65535.5;
    
     dummy2 = 60.00 * 0.00005;
    
    for( i= 0; i< 300; i++)
    {
        dummy1 = dummy1 + dummy2;
        printf("dummy1 = %f   %f\n", dummy1, dummy2);
    }

    return 0;
};
Sam
  • 1
  • The result of `sizeof` is a value of the type `size_t`, which needs to be printed with `%zu`. While it shouldn't affect the behavior of your program, it does technically have *undefined behavior* because you use mismatching format specifier and argument when printing the size. – Some programmer dude Oct 28 '21 at 07:18
  • 1
    Also, you have three *different* programs you show. Two as images (please don't post images of text, least of all code). Which of the three programs are your [mre]? Also please take some time to read [the help pages](http://stackoverflow.com/help), take the SO [tour], read [ask], as well as [this question checklist](https://codeblog.jonskeet.uk/2012/11/24/stack-overflow-question-checklist/). – Some programmer dude Oct 28 '21 at 07:21
  • Do not show source code or program output as images. Your program outputs text, which is easy to copy and paste into the question. – Eric Postpischil Oct 28 '21 at 11:29

1 Answers1

2

(This answers presumes IEEE-754 single and double precision binary formats are used for float and double.)

60.00 * 0.00005 is computed with double arithmetic and produces 0.003000000000000000062450045135165055398829281330108642578125. When this is stored in dummy2, it is converted to 0.0030000000260770320892333984375.

In the loop, dummy1 eventually reaches the value 65535.99609375. Then, when dummy1 and dummy2 are added, the result computed with real-number arithmetic would be 65535.9990000000260770320892333984375. This value is not representable in the float format, so it is rounded to the nearest value representable in the float format, and that is the result that the + operator produces.

The nearest representable values in the float format are 65535.99609375 and 65536. Since 65536 is closer to 65535.9990000000260770320892333984375, it is the result.

In the next iteration, 65536 and 0.0030000000260770320892333984375 are added. The real-arithmetic result would be 65536.0030000000260770320892333984375. This is also not representable in float. The nearest representable values are 65536 and 65536.0078125. Again 65536 is closer, so it is the computed result.

From then on, the loop always produces 65536 as a result.

You can get better results either by using double arithmetic or by computing dummy1 afresh in each iteration instead of accumulating rounding errors from iteration to iteration:

for (i = 0; i < 300; ++i)
{
    dummy1 = 65535.5 + i * 60. * .00005;
    printf("%.99g\n", dummy1);
}

Note that because dummy1 is a float, it does not have the precision required to distinguish some successive values of the sequence. For example, output of the above includes:

65535.9921875
65535.99609375
65535.99609375
65536
65536.0078125
65536.0078125
65536.0078125
65536.015625
65536.015625
65536.015625
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312