Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

Wikipedia on IEEE 754 (2008)
ieee.org documentation
https://en.wikipedia.org/wiki/Single-precision_floating-point_format aka binary32, usually called float or real4. Nice diagrams of the bit-pattern, and range over which it can represent every integer exactly, and so on.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format usually called double or real8
Algorithm to convert an IEEE 754 double to a string? including the recent Ryū: fast float-to-string conversion

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions

votes

3 answers

Why do we bias the exponent of a floating-point number?

I'm trying to wrap my head around this floating point representation of binary numbers, but I couldn't find, no matter where I looked, a good answer to the question. Why is the exponent biased? What's wrong with the good old reliable two's…

floating-point ieee-754 exponent

asked Nov 08 '13 at 17:10

so.very.tired

2,958
4
41
69

votes

4 answers

Are IEEE floats valid key types for std::map and std::set?

Background The requirement for a comparator on the key type of an associative container (for example std::map) is that it imposes a strict weak order on the elements of the key type. For a given comparator comp(x, y) we define equiv(x, y) = !comp(x,…

c++ dictionary set ieee-754

asked Jan 27 '11 at 12:14

etarion

16,935
4
43
66

votes

2 answers

How to get the IEEE 754 binary representation of a float in C#

I have some single and double precision floats that I want to write to and read from a byte[]. Is there anything in .Net I can use to convert them to and from their 32 and 64 bit IEEE 754 representations?

c# binary floating-point ieee-754

asked Nov 22 '10 at 19:30

HasaniH

8,232
6
41
59

votes

2 answers

.NET Decimal, IEEE Decimal.... any discussions etc. about that and the future?

Note that I am not looking for something opinion based or some third party library - I merely want confirmation that nothing is planned (or pointers to some discussion by the powers to be). I tried google and fail to find anything, so it looks like…

.net ieee-754

asked Mar 03 '14 at 08:28

TomTom

61,059
10
88
148

votes

8 answers

Extracting the exponent and mantissa of a Javascript Number

Is there a reasonably fast way to extract the exponent and mantissa from a Number in Javascript? AFAIK there's no way to get at the bits behind a Number in Javascript, which makes it seem to me that I'm looking at a factorization problem: finding m…

javascript math haskell ghc ieee-754

asked Feb 21 '12 at 19:09

valderman

8,365
4
22
29

votes

4 answers

Representing integers in doubles

Can a double (of a given number of bytes, with a reasonable mantissa/exponent balance) always fully precisely hold the range of an unsigned integer of half that number of bytes? E.g. can an eight byte double fully precisely hold the range of numbers…

c math floating-point precision ieee-754

asked Apr 17 '09 at 06:00

user82238

votes

2 answers

How cross-platform is Google's Protocol Buffer's handling of floating-point types in practice?

Google's Protocol Buffers allows you to store floats and doubles in messages. I looked through the implementation source code wondering how they managed to do this in a cross-platform manner, and what I stumbled upon was: inline uint32…

c++ cross-platform floating-point protocol-buffers ieee-754

asked Aug 30 '11 at 19:49

Gregory Crosswhite

1,457
8
17

votes

5 answers

Why does casting Double.NaN to int not throw an exception in Java?

So I know the IEEE 754 specifies some special floating point values for values that are not real numbers. In Java, casting those values to a primitive int does not throw an exception like I would have expected. Instead we have the following: int…

java casting floating-point ieee-754

asked May 03 '11 at 22:16

Michael McGowan

6,528
8
42
70

votes

7 answers

Compression algorithm for IEEE-754 data

Anyone have a recommendation on a good compression algorithm that works well with double precision floating point values? We have found that the binary representation of floating point values results in very poor compression rates with common…

floating-point compression ieee-754

asked Feb 10 '10 at 17:05

David Taylor

2,021
21
25

votes

9 answers

Why converting from float to double changes the value?

I've been trying to find out the reason, but I couldn't. Can anybody help me? Look at the following example. float f = 125.32f; System.out.println("value of f = " + f); double d = (double) 125.32f; System.out.println("value of d = " + d); This is…

java floating-point double precision ieee-754

asked Jul 06 '13 at 16:29

arthursfreire

votes

2 answers

Random generation of C programs with floating-point

Does anyone know a random generator of C programs that include floating-point computations? I am looking for something that would be a little bit like Csmith, except that Csmith does not generate floating-point expressions, and that it generates…

c floating-point ieee-754 random-testing

asked Dec 10 '11 at 21:20

Pascal Cuoq

79,187
7
161
281

votes

13 answers

Floating Point to Binary Value(C++)

I want to take a floating point number in C++, like 2.25125, and a int array filled with the binary value that is used to store the float in memory (IEEE 754). So I could take a number, and end up with a int num[16] array with the binary value of…

c++ binary floating-point ieee-754

asked Jan 23 '09 at 18:45

user58389

votes

6 answers

Is there a floating point value of x, for which x-x == 0 is false?

In most cases, I understand that a floating point comparison test should be implemented using over a range of values (abs(x-y) < epsilon), but does self subtraction imply that the result will be zero? // can the assertion be triggered? float x =…

floating-point floating-accuracy ieee-754

asked Apr 21 '10 at 21:19

Andrew Walker

40,984
8
62
84

votes

3 answers

How do I convert from a decimal number to IEEE 754 single-precision floating-point format?

How would I go about manually changing a decimal (base 10) number into IEEE 754 single-precision floating-point format? I understand that there is three parts to it, a sign, an exponent, and a mantissa. I just don't completely understand what the…

binary floating-point ieee-754

asked Mar 08 '10 at 20:56

tgai

1,117
2
14
29

votes

3 answers

Floating-point: "The leading 1 is 'implicit' in the significand." -- ...huh?

I'm learning about the representation of floating-point IEEE 754 numbers, and my textbook says: To pack even more bits into the significand, IEEE 754 makes the leading 1-bit of normalized binary numbers implicit. Hence, the number is actually 24…

ieee-754

asked Feb 08 '11 at 06:36

user541686

205,094
128
528
886

Prev 1 2 3

…

96 97 Next