Number precision error in C

Question

Here is a code I wrote:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    double num;
    int tmp;
    printf("enter a number!\n");
    scanf("%lf",&num);
    tmp=num*10000;
    printf(" temp=%d\n",tmp);

    return 0; 
}

When I enter the number 1441.1441 the result i'm getting is 14411440 instead of 14411441 which is obviously the correct result after multiplying my input number by 10000. Can someone help me figure out this problem?

What happens if you change `tmp` to be a `double` and also change `10000` to `10000.0`? — Nikos C., Nov 23 '12 at 08:36
the same problem is reproduced for me. I m using linux on 32 bit system — MOHAMED, Nov 23 '12 at 08:38
This is such a FAQ... [read this](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html). — Lundin, Nov 23 '12 at 10:16
i am using windows 7 64 bit.. and i've tried in windows 8 32 bits also.. same result.. Lundin could you be more precise? — Rona Hirsch, Nov 23 '12 at 12:51

paxdiablo · Accepted Answer · 2012-11-23T14:04:53.283

11

Since the vast majority of real numbers cannot actually be represented exactly, you'll probably find that 1441.1441 is actually stored as something like 1441.14409999_blah_blah_blah. You can find that out by inserting:

printf ("%.50lf\n", num);

immediately after the scanf and seeing (trailing zeroes removed):

1441.14409999999998035491444170475006103515625

Now that's actually the correct (ie, closest) value based on your input. The next highest number from there gives you:

1441.144100000000207728589884936809539794921875

The error with the first value is:

0.00000000000001964508555829524993896484375
               ^ ~ 2 x 10^-14

while the error with the second is:

0.000000000000207728589884936809539794921875
              ^ ~ 2 x 10^-13

and you can see the latter error is about 10 times as much.

When you multiply that by 10000 and try to shoehorn it into an int, it gets rounded down (truncated). That's because the (C11) standard has this to say in 6.3.1.4:

When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero).

One thing you can try is to change your shoehorning line into:

tmp = num * 10000 + 0.5;

which effectively turns the truncation into a rounding operation. I think that will work for all cases but you may want to test it (and keep an eye on it) just in case.

edited Nov 23 '12 at 14:04

answered Nov 23 '12 at 08:38

paxdiablo

854,327
234
1,573
1,953

he input the float number. and not is given from an arithmetic operation – MOHAMED Nov 23 '12 at 08:40
1

if you change tmp to float you will get the following result: `temp=14411441.000000` – MOHAMED Nov 23 '12 at 08:42
7

@Mohamed KALLEL: a number that can't be represented exactly can't be represented exactly, whatever the method you used to "input" it. – Mat Nov 23 '12 at 08:42
3

+1, and if you ever have a spare couple of hours, I highly advise a read of [this document](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html), which explains everything you wanted to know, and a lot that you didn't, about floating pointer representations. – WhozCraig Nov 23 '12 at 08:43
1

Since some people are getting different results: the difference has to be in scanf. Apparently some implementations will round up to the higher adjacent double value (or to the nearest), and some will round down to the lower one. – Medo42 Nov 23 '12 at 08:48
@Medo42 This information is the most insightful of the whole question, and I wish it appeared prominently in an answer instead of jut in a comment. – Pascal Cuoq Nov 23 '12 at 09:20
I'm not sure why you mention floats. These variables are double rather than single precision, and have more than enough precision bits to represent every 32-bit value. – paxdiablo Nov 23 '12 at 11:29
1

@Medo42 Another explanation is that the multiplication is performed at greater precision than `double` on some systems, but not on others. If you multiply at `double` precision, the result is exactly 14411441. – Daniel Fischer Nov 23 '12 at 13:16

score 1 · Answer 2 · edited May 23 '17 at 12:04

For the general principle, paxdiablo's answer contains the relevant parts. Most terminating decimal fractions cannot be exactly represented as binary floating point numbers, hence the value of the floating point variable is a little smaller or larger than the mathematical value of the number representation in the given string, so when you want to get the appropriate integer value after scaling, you should round and not truncate.

But in the specific example here, we have a different scenario. The closest IEEE754 double precision (64-bit binary) value to 1441.1441 is

1441.14409999999998035491444170475006103515625

which is indeed a little smaller than 1441.1441. But if that value is multiplied with 10000 as an IEEE754 double precision value, the result is exactly

14411441

What happens here is that, as is allowed per 5.2.4.2.2 paragraph 9

Except for assignment and cast (which remove all extra range and precision), the values yielded by operators with floating operands and values subject to the usual arithmetic conversions and of floating constants are evaluated to a format whose range and precision may be greater than required by the type.

(emphasis mine), the product is evaluated with a greater precision than required by the type (probably the x87 80-bit format), yielding a slightly smaller value, and when the result of the multiplication is converted to int, the fractional part is discarded, and you get 14411440.

scanf("%lf",&num);

The value is stored in num, so it must have exactly the precision of double.

tmp=num*10000;

The product num * 10000 is neither stored nor cast to double, so it may have greater precision, resulting in a smaller or larger value than the closest double value. That value is then truncated to obtain the int.

If you stored the product in a double variable

num *= 10000;
tmp = num;

or cast it to double before converting to int,

tmp = (double)(num * 10000);

you ought to get the result 14411441 for the input 1441.1441 (but note that not all compilers always honour the requirement of converting to the exact required precision when casting or storing - violating the standard - so there's no guarantee that that will produce 14411441 with all optimisation settings).

Since many 64-bit platforms perform floating-point arithmetic using SSE instructions rather than the x87 coprocessor, the observed behaviour is less likely to appear on 64-bit systems than on 32-bit systems.

score 0 · Answer 3 · answered Nov 23 '12 at 08:43

0

Try to make it round like that:

float a = 3.14;

int i = (int)(a+0.5);

In your case:

 double num;
 int tmp;
 printf("enter a number!\n");
 scanf("%lf",&num);
 tmp=(int)(num*10000 + 0.5);
 printf(" temp=%d\n",tmp);

answered Nov 23 '12 at 08:43

0x90

39,472
36
165
245

score -4 · Answer 4 · answered Nov 23 '12 at 08:56

-4

It looks like scanf is using float precision inside scanf. I breifly checked that 1441.1441 is represented in float as 1441.1440. In genereal you shouldn't rely on precision in floating point operations.

answered Nov 23 '12 at 08:56

son of the northern darkness

709
3
12

I dont think it is the case. It will be a violation of the contract of `%lf`. – UmNyobe Nov 23 '12 at 09:11
I ckecked it in my java project. – son of the northern darkness Nov 23 '12 at 09:36
System.out.println((float)1441.1441); prints 1441.144 – son of the northern darkness Nov 23 '12 at 09:36

Number precision error in C

4 Answers4