-1

Multiplication using FLOAT is giving noticeable difference.


public static void main(String[] args) {
    // using string and parsing instead of actual data type is part of use case, that is why representing the same here

    double v1 = parseDouble("590.0");
    double v2 = parseDouble("490.0");
    double v3 = parseDouble("391.0");

    float v4 = parseFloat("590.0");
    float v5 = parseFloat("490.0");
    float v6 = parseFloat("391.0");

    System.out.println(new BigDecimal(v1 * v2 * v3));
    System.out.println(new BigDecimal(v4 * v5 * v6));

    System.out.println(BigDecimal.valueOf(Float.parseFloat("289100.0") * Float.parseFloat("391.0")));
    System.out.println(BigDecimal.valueOf(Double.parseDouble("289100.0") * Double.parseDouble("391.0")));

}

Output:

113038100 // double multiplication
113038096 // float multiplication
113038096
113038100

For above code,

(590.0 * 490.0 * 391.0) gives 113038100 using double

(590.0 * 490.0 * 391.0) gives 113038096 using float (113038100 - 113038096 = 4 // difference)

I have read through https://floating-point-gui.de/basic/ this link and able to understand how floating point calculation happens and all however 4 count different is unexpected.

Please help me understand below things

  • Is this correct first
  • Does always float gives wrong numbers ??
  • As I can see double also use same technique, so how much guarantee we have to get correct result if we use double
jagadesh
  • 71
  • 1
  • 6
  • @user16320675 - to be honest I don't understand that...it is giving 8.0 what is that the last float point digit?? And if 4 difference is not unexpected why use float?? it's like all my life I was fooled...if its 1.000004, this is acceptable but 100 vs 104 (eg) completely not acceptable – jagadesh Dec 28 '22 at 14:51
  • 1
    that is the difference to the next number that can be represented using `float`, as described in the documentation. "not acceptable"? then don't use it - computers are limited, there is just a **finite** amount of bits available, but an infinite number of real numbers (a number like `12345678901234567890` would need more bits than `123` - there is much more information needed, even without decimals) – user16320675 Dec 28 '22 at 14:57
  • @user16320675 - Definitely I am not going to use it...Let me rephrase my question... (590.0 * 490.0 * 391.0) gives 113038096 = is this correct ?? – jagadesh Dec 28 '22 at 15:06
  • @jagadesh This is one of the surprising things about floating point: for large numbers, you will have "roundoff error" to the *left* of the decimal point. Happens all the time, perfectly normal, nothing to worry about. Naturally it's worse for single precision — which is another reason never to use single precision (aka "float"). – Steve Summit Dec 28 '22 at 18:26
  • BTW for java (and the used standard IEEE 754) `(590.0 * 490.0 * 391.0)` is more like `1.130381 * 10⁸` or `1.130381E8` – user16320675 Jan 09 '23 at 21:24

2 Answers2

0

Does always float gives wrong numbers ??

it depend on the number if the number can be represented using the float precision then it will be fine

"As I can see double also use same technique, so how much guarantee we have to get correct result if we use double"

double has same issue but since double has more precision the possibility get lower but it still happen

so when you need a very precise result like in scientific or financial app you will need to use BigDecimal

watch this video it explain how the float point number work https://www.youtube.com/watch?v=ajaHQ9S4uTA

  • yeah...I understand that for accurate result we need to use big decimal and all however is this not weird that giving 4 numbers difference even though there is no decimals for used numbers or is this expected behaviour ?? – jagadesh Dec 28 '22 at 14:42
  • @jagadesh float point is about the whole number even if there no decimal part it still possible to has issue to be represented as binary the 8:22 of the video show example for how 3 can be represented of course in the video they did not use the acutle float/double precision –  Dec 28 '22 at 14:57
0

Is this correct first

The Java float format is IEEE-754 binary32. In this format, every finite number is represented as a sign, a 24-bit integer, and a scaling by a power of two from 2−149 to 2104. The integer part is called the significand. (The format is often described as a sign, a 24-bit number with a binary point after the first bit, so it has a value in [0, 2), and a scaling from 2−126 to 2127. These are mathematically equivalent, and the format used here is noted in the IEEE-754 standard as an option.) In normal form, the 24-bit integer is 223 or greater. (Representable numbers less than 2−126 cannot be represented in normal form and are necessarily subnormal.)

In this format, 590 can be represented as +590•20 or +8,339,456•2−14. 490 is +490•20 or +16,056,320•2−15.

Their product is +289,100•20 or +9,251,200•2−5.

391 is +391•20 or +12,812,288−15.

The ordinary arithmetic product of +289,100•20 and +391•20 is +113,038,100•20. However, 113,038,100 is not a 24-bit number; it is a 27-bit number. To get it under 224, we can adjust the scaling, multiplying the significand by ⅛ and multiplying the scaling by 8 = 23.

That gives us +14,129,762.5•23. However, now the significand is not an integer. This result is not representable in the float format. To produce a result, the operation of adding in the float format is defined to round the ordinary arithmetic to the nearest representable value. In this case, there is a tie, we could round the .5 up or down. Ties are resolved by rounding to make the low digit even, so we round to +14,129,762•23.

+14,129,762•23 is 113,038,096. That is the result you got, so it is correct.

Does always float gives wrong numbers ??

This is not wrong; the computer behaved according to its specification.

Observe float is a 32-bit format, but there are infinitely many real numbers. There are even infinitely many rational numbers. It is impossible for a 32-bit format to produce the same results as theoretical real-number arithmetic or rational-number arithmetic. There are simply more possible results than there are representable values.

This is true of the 64-bit double format as well. It is also true of integer formats, fixed-precision formats, and all numerical formats with a fixed number of bits. A fixed number of bits cannot represent infinitely many values.

Your comments suggest you thought floating-point would produce approximate results for fractional values, numbers less than one. But the limitation on how many values can be represented applies at all scales. At each scale (each power of two), only 224 values are representable (223 in normal form). For scale 20, all the non-negative integers below 224 are representable. But, above that, only some of the integers are representable. At first, we have to skip every second integer, then every fourth, then every eighth, and so on.

Floating-point arithmetic is designed to approximate real-number arithmetic. It should be used when you want to approximate real-number arithmetic. It should not be used, with rare exceptions, when you want exact arithmetic.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312