converting really large int to double, loss of precision on some computer

Question

I'm currently learning inter-type data convertion in cpp. I have been taught that

For a really large int, we can (for some computers) suffer a loss of precision when converting to double.

But no reason was provided for the statement.

Could someone please provide an explanation and an example? Thanks

This depends on how large the `int` is. [`double`](https://en.wikipedia.org/wiki/IEEE_754) stores (according to IEEE 754) a mantissa of 53 bits. So, every 32 bit `int` value must be loss-less storable. If 64 bit `int`s are supported (e.g. `long int` on a 64 bit platform) it doesn't fit into mantissa except it has at least 11 leading 0s. (I neglected that a `double` may left out a "silent" 1 at begin of mantissa which is not stored but provides an extra bit accuracy in mantissa.) — Scheff's Cat, Sep 25 '18 at 05:49

eerorika · Accepted Answer · 2018-09-25T07:20:07.103

Let's say that the floating point number uses N bits of storage.

Now, let us assume that this float can precisely represent all integers that can be represented by an integer type of N bits. Since the N bit integer requires all of its N bits to represent all of its values, so would be the requirement for this float.

A floating point number should be able to represent fractional numbers. However, since all of the bits are used to represent the integers, there are zero bits left to represent any fractional number. This is a contradiction, and we must conclude that the assumption that float can precisely represent all integers as equally sized integer type must be erroneous.

Since there must be non-representable integers in the range of a N bit integer, it is possible that converting such integer to a floating point of N bits will lose precision, if the converted value happens to be one of the non-representable ones.

Now, since a floating point can represent a subset of rational numbers, some of those representable values may indeed be integers. In particular, the IEEE-754 spec guarantees that a binary double precision floating point can represent all integers up to 2⁵³. This property is directly associated with the length of the mantissa.

Therefore it is not possible to lose precision of a 32 bit integer when converting to a double on a system which conforms to IEEE-754.

More technically, the floating point unit of x86 architecture actually uses a 80-bit extended floating point format, which is designed to be able to represent precisely all of 64 bit integers and can be accessed using the long double type.

score 6 · Answer 2 · answered Sep 25 '18 at 05:49

This may happen if int is 64 bit and double is 64 bit as well. Floating point numbers are composed of mantissa (represents the digits) and exponent. As mantissa for the double in such a case has less bits than the int, then double is able to represent less digits and a loss of precision happens.

converting really large int to double, loss of precision on some computer

2 Answers2

Linked

Related