Problematic understanding of IEEE 754

Question

First of all i woild like to point out that i am not native speaker and i really need some terms used more commonly.

And the second thing i would like to mention is that i am not a math genious. I am really trying to understand everything about programming.. but ieee-754 makes me think that it'll never happan.. its full of mathematical terms i don't understand..

What is precision? What is it used for? What is mantissa and what is mantissa used for? How to determine the range of float/double by their size? What is ± symbol (Plus-minus) used for? (i believe its positive/negative choice but what does that have to do with everything?),

Isn't there any brief and clean explanation you guys could provide me with? I spent 600 years of trying to understand wikipedia. I failed tremendously.

there is plenty of info on the internet, eg http://floating-point-gui.de/basic/ — thumbmunkeys, Oct 18 '14 at 15:58
In your 600 years on wikipedia, did you happen to see [this article](http://en.wikipedia.org/wiki/Single-precision_floating-point_format)? — user3386109, Oct 18 '14 at 16:03
Not to insult you but .. http://www.dummies.com/how-to/content/the-real-difference-between-integers-and-floatingp.html — Charlie Burns, Oct 18 '14 at 16:14
What is your native language? From the [main Wikipedia article](http://en.wikipedia.org/wiki/IEEE_floating_point) on the subject, select your preferred language from the language menu at the left-hand side bar. They are not direct translations of each other, but separately authored. You may fair better with one in your own language. — Clifford, Oct 18 '14 at 17:39

Clifford · Accepted Answer · 2014-10-19T10:00:10.350

What is precision?

It refers to how closely a binary floating point representation can represent a real value. Real values have infinite precision and infinite range. Digital values have finite range and precision. In practice a single-precision IEEE-754 can represent real values of a precision of 6 significant figures (decimal), while double-precision is good for 15 significant figures.

The practical effect of this for example is that a single precision value: 123456000.00 cannot be distinguished from say 123456001.00, but equally a value 0.00123456 can be represented.

What is it used for?

Precision is not used for anything other than to define a characteristic of a particular floating point representation.

What is mantissa and what is mantissa used for?

The term is not mentioned in the English language Wikipedia article, and is imprecise - in mathematics in general it has a different meaning that that used here.

The correct term is significand. For a decimal value 0.00123456 for example the significand is is 123456. 123456000.00 has exactly the same significand. Each of these values has the same significand but a different exponent. The exponent is a scaling factor which determines where the decimal point is (hence floating point).

Of course IEEE754 is a binary floating point representation not decimal, but for the same of explanation of the terms it is perhaps easier to use decimal.

How to determine the range of float/double by their size?

By the size alone you cannot; you need to know how many bits are assigned to the significand and how many bits are assigned to the exponent. In C however the range is defined by the macros FLT_MIN, FLT_MAX, DBL_MIN and DBL_MAX in the float.h header. Other characteristics of the implementations floating point representation are described there also.

Note that a specific compiler may not in fact use IEEE754, however that is the format used by most hardware FPU implementations, and the compiler will naturally follow that. For targets with no FPU (small embedded processors typically), other formats may be used.

What is ± symbol (Plus-minus) used for?

It simply means that the value given may be both positive or negative. It may refer to a specific value, or it may indicate a range. So ±n may refer to two discrete values -n or +n, or it may mean a range -n to +n. Context is everything! In this article it refers to discrete values +0, -0, +∞ and -∞.

But assuming the datatype `float` is 4 bytes.. Can't i calculate the range/maximum ? — Genis, Oct 19 '14 at 14:01
@Compiled : That is 32 bits, you need to know how those are distributed amongst significand and exponent to determine the range. IEEE754 binary32 format specifies 24 bit significand, 8 bits exponent and 1 sign bit. It is these figures from which you can calculate the range, but there is little need, the number of bits are standardised so the range is standardised too. — Clifford, Oct 19 '14 at 17:35

score 0 · Answer 2 · edited Oct 19 '14 at 09:51

There are 3 different components: sign, exponent, mantissa

Assuming that the exponent has only 2 Bits, 4 combinations are possible:

binary decimal
00     0
01     1
10     2
11     3

The represented floating-point value is 2^exponent:

binary exponent-value
00     2^0 = 1
01     2^1 = 2
10     2^2 = 4
11     2^3 = 8

The range of the floating point value, results from the exponent. 2 bits => maximum value = 8.

The mantissa divide the range from a given exponent to the next higher exponent. For example the exponent is 2 and the mantissa has one bit, then there are two values possible:

exponent-value  mantissa-binary  represented floating-point value
2               0                2
2               1                3

The represented floating-point value is 2^exponent × (1 + m1×2^-1 + m2×2^-2 + m3×2^-3 + …). Here an example with a 3 bit mantissa:

exponent-value  mantissa-binary  represented floating-point value
2               000              2 * (1                      ) = 2
2               001              2 * (1                + 2^-3) = 2,25
2               010              2 * (1         + 2^-2       ) = 2,5
2               011              2 * (1         + 2^-2 + 2^-3) = 2,75
2               100              2 * (1 + 2^-1               ) = 3
and so on…

The sign has only just one Bit:
0 -> positive value
1 -> negative value

In IEEE-754 a 32 bit floating-point data type has an 8 bit exponent (with a range from 2^-127 to 2¹²⁸) and a 23 bit mantissa.

1 10000010 01101000000000000000000
- 130      1,40625

The represented floating-point value for this is:
-1 × 2^{(130 – 127)} × (1 + 2^-2 + 2^-3 + 2^-5) = -11,25

Try it: http://www.h-schmidt.net/FloatConverter/IEEE754.html

In IEEE754 the more accurate term *significand* is used rather than *mantissa*. Even the linked converter page has confused the terms, saying *"mantissa (also known as significand or fraction)*", but "fraction" is the correct mathematical meaning of *mantissa* and not its meaning here. You have also somewhat glossed over the fact that the exponent itself is signed - only introducing that indirectly at the end. — Clifford, Oct 19 '14 at 09:43

Problematic understanding of IEEE 754

2 Answers2