Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

Wikipedia on IEEE 754 (2008)
ieee.org documentation
https://en.wikipedia.org/wiki/Single-precision_floating-point_format aka binary32, usually called float or real4. Nice diagrams of the bit-pattern, and range over which it can represent every integer exactly, and so on.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format usually called double or real8
Algorithm to convert an IEEE 754 double to a string? including the recent Ryū: fast float-to-string conversion

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions

votes

6 answers

Large numbers erroneously rounded in JavaScript

See this code: var jsonString = '{"id":714341252076979033,"type":"FUZZY"}'; var jsonParsed = JSON.parse(jsonString); console.log(jsonString, jsonParsed); When I see my console in Firefox 3.5, the value of jsonParsed is the number rounded: Object…

asked Sep 04 '09 at 15:24

Jaanus

17,688
15
65
110

votes

5 answers

Are all integer values perfectly represented as doubles?

My question is whether all integer values are guaranteed to have a perfect double representation. Consider the following code sample that prints "Same": // Example program #include #include int main() { int a = 3; int b =…

c++ double standards precision ieee-754

asked Apr 27 '17 at 10:51

Thomas

4,696
5
36
71

votes

3 answers

Why does IEEE 754 reserve so many NaN values?

It seems that the IEEE 754 standard defines 16,777,214 32-bit floating point values as NaNs, or 0.4% of all possible values. I wonder what is the rationale for reserving so many useful values, while only 2 ones essentially needed: one for signaling…

floating-point nan ieee-754

asked Nov 05 '13 at 22:34

leventov

14,760
11
69
98

votes

10 answers

Formatting doubles for output in C#

Running a quick experiment related to Is double Multiplication Broken in .NET? and reading a couple of articles on C# string formatting, I thought that this: { double i = 10 * 0.69; Console.WriteLine(i); Console.WriteLine(String.Format("…

c# formatting floating-point ieee-754

asked Sep 14 '09 at 13:23

Pete Kirkham

48,893
5
92
171

votes

2 answers

Why does MSVS not optimize away +0?

This question demonstrates a very interesting phenomenon: denormalized floats slow down the code more than an order of magnitude. The behavior is well explained in the accepted answer. However, there is one comment, with currently 153 upvotes, that…

c floating-point compiler-optimization ieee-754 negative-zero

asked May 10 '13 at 07:11

Vorac

8,726
11
58
101

votes

4 answers

Are the bit patterns of NaNs really hardware-dependent?

I was reading about floating-point NaN values in the Java Language Specification (I'm boring). A 32-bit float has this bit format: seee eeee emmm mmmm mmmm mmmm mmmm mmmm s is the sign bit, e are the exponent bits, and m are the mantissa bits. A…

java floating-point nan ieee-754

asked Jul 31 '14 at 02:56

Boann

48,794
16
117
146

votes

7 answers

Double precision - decimal places

From what I have read, a value of data type double has an approximate precision of 15 decimal places. However, when I use a number whose decimal representation repeats, such as 1.0/7.0, I find that the variable holds the value of…

c++ c precision ieee-754

asked Apr 03 '12 at 18:31

nf313743

4,129
8
48
63

votes

3 answers

Usefulness of signaling NaN?

I've recently read up quite a bit on IEEE 754 and the x87 architecture. I was thinking of using NaN as a "missing value" in some numeric calculation code I'm working on, and I was hoping that using signaling NaN would allow me to catch a floating…

c++ visual-c++ floating-point ieee-754 x87

asked Feb 11 '10 at 20:19

user123456

votes

4 answers

How does this float square root approximation work?

I found a rather strange but working square root approximation for floats; I really don't get it. Can someone explain me why this code works? float sqrt(float f) { const int result = 0x1fbb4000 + (*(int*)&f >> 1); return *(float*)&result; …

c++ c optimization floating-point ieee-754

asked Mar 30 '17 at 13:57

YSC

38,212
9
96
149

votes

4 answers

Why does division by zero in IEEE754 standard results in Infinite value?

I'm just curious, why in IEEE-754 any non zero float number divided by zero results in infinite value? It's a nonsense from the mathematical perspective. So I think that correct result for this operation is NaN. Function f(x) = 1/x is not defined…

language-agnostic floating-point ieee-754

asked Feb 04 '13 at 07:17

Evgeny Lazin

9,193
6
47
83

votes

3 answers

Difference between Java's `Double.MIN_NORMAL` and `Double.MIN_VALUE`?

What's the difference between Double.MIN_NORMAL (introduced in Java 1.6) and Double.MIN_VALUE?

java ieee-754

asked Sep 16 '10 at 15:42

Cheok Yan Cheng

47,586
132
466
875

votes

3 answers

Why is Number.MAX_SAFE_INTEGER 9,007,199,254,740,991 and not 9,007,199,254,740,992?

ECMAScript 6's Number.MAX_SAFE_INTEGER supposedly represents the maximum numerical value JavaScript can store before issues arise with floating point precision. However it's a requirement that the number 1 added to this value must also be…

javascript ecmascript-6 integer ieee-754

asked Oct 15 '14 at 10:33

James Donnelly

126,410
34
208
218

votes

14 answers

32-bit to 16-bit Floating Point Conversion

I need a cross-platform library/algorithm that will convert between 32-bit and 16-bit floating point numbers. I don't need to perform math with the 16-bit numbers; I just need to decrease the size of the 32-bit floats so they can be sent over the…

c++ networking ieee-754

asked Nov 02 '09 at 04:42

Matt Fichman

5,458
4
39
59

votes

3 answers

Double vs float on the iPhone

I have just heard that the iphone cannot do double natively thereby making them much slower that regular float. Is this true? Evidence? I am very interested in the issue because my program needs high precision calculations, and I will have to…

iphone cocoa-touch floating-point ieee-754

asked Oct 26 '09 at 01:33

John Smith

12,491
18
65
111

votes

2 answers

Coercing floating-point to be deterministic in .NET?

I've been reading a lot about floating-point determinism in .NET, i.e. ensuring that the same code with the same inputs will give the same results across different machines. Since .NET lacks options like Java's fpstrict and MSVC's fp:strict, the…

c# .net floating-point ieee-754 non-deterministic

asked Feb 13 '13 at 22:15

Asik

21,506
6
72
131

Prev 1

…

96 97 Next