Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

Wikipedia on IEEE 754 (2008)
ieee.org documentation
https://en.wikipedia.org/wiki/Single-precision_floating-point_format aka binary32, usually called float or real4. Nice diagrams of the bit-pattern, and range over which it can represent every integer exactly, and so on.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format usually called double or real8
Algorithm to convert an IEEE 754 double to a string? including the recent Ryū: fast float-to-string conversion

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions

votes

9 answers

Portability of binary serialization of double/float type in C++

The C++ standard does not discuss the underlying layout of float and double types, only the range of values they should represent. (This is also true for signed types, is it two's compliment or something else) My question is: What the are…

asked Jan 19 '11 at 08:27

Matthieu N.

votes

1 answer

What would cause the C/C++ <, <=, and == operators to return true if either argument is NaN?

My understanding of the rules of IEEE-754 floating-point comparisons is that all comparison operators except != will return false if either or both arguments are NaN, while the != operator will return true. I can easily reproduce this behavior with…

c++ c nan ieee-754

asked May 13 '14 at 21:05

Sean

votes

5 answers

Do any real-world CPUs not use IEEE 754?

I'm optimizing a sorting function for a numerics/statistics library based on the assumption that, after filtering out any NaNs and doing a little bit twiddling, floats can be compared as 32-bit ints without changing the result and doubles can be…

performance sorting floating-point ieee-754

asked Feb 10 '10 at 04:44

dsimcha

67,514
53
213
334

votes

2 answers

sign changes when going from int to float and back

Consider the following code, which is an SSCCE of my actual problem: #include int roundtrip(int x) { return int(float(x)); } int main() { int a = 2147483583; int b = 2147483584; std::cout << a << " -> " << roundtrip(a)…

c++ floating-point type-conversion ieee-754 twos-complement

asked Dec 08 '13 at 12:34

fredoverflow

256,549
94
388
662

votes

7 answers

In binary notation, what is the meaning of the digits after the radix point "."?

I have this example on how to convert from a base 10 number to IEEE 754 float representation Number: 45.25 (base 10) = 101101.01 (base 2) Sign: 0 Normalized form N = 1.0110101 * 2^5 Exponent esp = 5 E = 5 + 127 = 132 (base 10) = 10000100 (base…

c++ c floating-point ieee-754

asked Mar 07 '13 at 22:13

Johnny Pauling

12,701
18
65
108

votes

4 answers

Does the C++ standard specify anything on the representation of floating point numbers?

For types T for which std::is_floating_point::value is true, does the C++ standard specify anything on the way that T should be implemented? For example, does T has even to follow a sign/mantissa/exponent representation? Or can it be completely…

c++ c++11 floating-point standards ieee-754

asked Dec 15 '15 at 16:51

Vincent

57,703
61
205
388

votes

4 answers

Converting IEEE 754 floating point in Haskell Word32/64 to and from Haskell Float/Double

Question In Haskell, the base libraries and Hackage packages provide several means of converting binary IEEE-754 floating point data to and from the lifted Float and Double types. However, the accuracy, performance, and portability of these methods…

haskell floating-point ghc ieee-754

asked Aug 08 '11 at 00:07

acfoltzer

5,588
31
48

votes

6 answers

Ranges of floating point datatype in C?

I am reading a C book, talking about ranges of floating point, the author gave the table: Type Smallest Positive Value Largest value Precision ==== ======================= ============= ========= float 1.17549 x 10^-38 …

c floating-point ieee-754

asked Apr 11 '12 at 14:32

ipkiss

13,311
33
88
123

votes

5 answers

Half-precision floating-point in Java

Is there a Java library anywhere that can perform computations on IEEE 754 half-precision numbers or convert them to and from double-precision? Either of these approaches would be suitable: Keep the numbers in half-precision format and compute…

java floating-point ieee-754 precision

asked May 28 '11 at 15:41

finnw

47,861
24
143
221

votes

3 answers

The Double byte size in 32 bit and 64 bit OS

Is there a difference in double size when I run my app on 32 and 64 bit environment? If I am not mistaken the double in 32 bit environment will take up 16 digits after 0, whereas the double in 64 bit will take up 32 bit, am I right?

c# 64-bit floating-point double ieee-754

asked Jul 09 '09 at 01:55

Graviton

81,782
146
424
602

votes

2 answers

Why is the square root of -Infinity +Infinity in Java?

I tried two different ways to find the square root in Java: Math.sqrt(Double.NEGATIVE_INFINITY); // NaN Math.pow(Double.NEGATIVE_INFINITY, 0.5); // Infinity Why doesn't the second way return the expected answer which is NaN (same as with the first…

java math floating-point ieee-754

asked Dec 29 '17 at 09:06

Pratik

votes

3 answers

Is a float guaranteed to be preserved when transported through a double in C/C++?

Assuming IEEE-754 conformance, is a float guaranteed to be preserved when transported through a double? In other words, will the following assert always be satisfied? int main() { float f = some_random_float(); assert(f ==…

c++ c floating-point double ieee-754

asked Feb 08 '13 at 13:00

Kristian Spangsege

2,903
1
20
43

votes

5 answers

How computer does floating point arithmetic?

I have seen long articles explaining how floating point numbers can be stored and how the arithmetic of those numbers is being done, but please briefly explain why when I write cout << 1.0 / 3.0 <

c++ math floating-point ieee-754

asked May 17 '11 at 15:24

Narek

38,779
79
233
389

votes

2 answers

How to check if C++ compiler uses IEEE 754 floating point standard

I would like to ask a question that follows this one which is pretty well answered by the define check if the compiler uses the standard. However this woks for C only. Is there a way to do the same in C++? I do not wish to covert floating point…

c++ compiler-construction floating-point ieee-754

asked Apr 25 '11 at 10:31

Rusty Horse

2,388
7
26
38

votes

3 answers

Is there any accuracy gain when casting to double and back when doing float division?

What is the difference between two following? float f1 = some_number; float f2 = some_near_zero_number; float result; result = f1 / f2; and: float f1 = some_number; float f2 = some_near_zero_number; float result; result = (double)f1 /…

c floating-point floating-accuracy ieee-754

asked Feb 05 '15 at 12:17

Piotr Lopusiewicz

2,514
2
27
38

Prev 1 2

…

96 97 Next