0

Let's say that I have a Single-precision floating-point format variable in my machine and I want to assign on it the result of a given operation. From Wikipedia:

The IEEE 754 standard specifies a binary32 as having:

  • Sign bit: 1 bit
  • Exponent width: 8 bits
  • Significand precision: 24 bits (23 explicitly stored)
    This gives from 6 to 9 significant decimal digits precision.

Is not clear to me how the last claim (precision of e-6) is derived. In general, given a data type as float32 above, or float64, how can one find out the precision limit in base 10?

user1172131
  • 103
  • 7
  • 2
    Sigh. [It is wrong.](https://stackoverflow.com/questions/61609276/how-to-calculate-float-type-precision-and-does-it-make-sense/61614323#61614323) – Eric Postpischil Dec 10 '21 at 18:53

1 Answers1

1

The basic math is this: you search n such that

2^24 = 10^n

you can resolve that by taking logarithm:

24*log(2)=n*log(10)

Take log in base 10, that is about

7.22... = n

About 7 (decimal) digits of precision.

note that digitus means finger in latin, so digits should be naturally decimal

aka.nice
  • 9,100
  • 1
  • 28
  • 40
  • Thank's for the answer; but why the significant bit are only 24? In other words why is there 24 in the exponent of the first equality? If I understand well it is a sort of upper bound on the precision, meaning that in the worst case scenario (exponent factor equal to 1) we can rely only on the precision given by the remaining 23 bit. Is it this reasoning true or am I missing something? – user1172131 Dec 10 '21 at 14:32
  • The implied leading 1 is still part of the number, the number is defined by a suite of 24 bits (except for zero and gradual underflow). Say in decimal you have 7.85, it's still a number with 3 digits, same for 1.01 in binary, it's still 3 bits. – aka.nice Dec 13 '21 at 13:12