Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions
83
votes
6 answers

Large numbers erroneously rounded in JavaScript

See this code: var jsonString = '{"id":714341252076979033,"type":"FUZZY"}'; var jsonParsed = JSON.parse(jsonString); console.log(jsonString, jsonParsed); When I see my console in Firefox 3.5, the value of jsonParsed is the number rounded: Object…
Jaanus
  • 17,688
  • 15
  • 65
  • 110
70
votes
5 answers

Are all integer values perfectly represented as doubles?

My question is whether all integer values are guaranteed to have a perfect double representation. Consider the following code sample that prints "Same": // Example program #include #include int main() { int a = 3; int b =…
Thomas
  • 4,696
  • 5
  • 36
  • 71
69
votes
3 answers

Why does IEEE 754 reserve so many NaN values?

It seems that the IEEE 754 standard defines 16,777,214 32-bit floating point values as NaNs, or 0.4% of all possible values. I wonder what is the rationale for reserving so many useful values, while only 2 ones essentially needed: one for signaling…
leventov
  • 14,760
  • 11
  • 69
  • 98
69
votes
10 answers

Formatting doubles for output in C#

Running a quick experiment related to Is double Multiplication Broken in .NET? and reading a couple of articles on C# string formatting, I thought that this: { double i = 10 * 0.69; Console.WriteLine(i); Console.WriteLine(String.Format("…
Pete Kirkham
  • 48,893
  • 5
  • 92
  • 171
65
votes
2 answers

Why does MSVS not optimize away +0?

This question demonstrates a very interesting phenomenon: denormalized floats slow down the code more than an order of magnitude. The behavior is well explained in the accepted answer. However, there is one comment, with currently 153 upvotes, that…
Vorac
  • 8,726
  • 11
  • 58
  • 101
61
votes
4 answers

Are the bit patterns of NaNs really hardware-dependent?

I was reading about floating-point NaN values in the Java Language Specification (I'm boring). A 32-bit float has this bit format: seee eeee emmm mmmm mmmm mmmm mmmm mmmm s is the sign bit, e are the exponent bits, and m are the mantissa bits. A…
Boann
  • 48,794
  • 16
  • 117
  • 146
59
votes
7 answers

Double precision - decimal places

From what I have read, a value of data type double has an approximate precision of 15 decimal places. However, when I use a number whose decimal representation repeats, such as 1.0/7.0, I find that the variable holds the value of…
nf313743
  • 4,129
  • 8
  • 48
  • 63
59
votes
3 answers

Usefulness of signaling NaN?

I've recently read up quite a bit on IEEE 754 and the x87 architecture. I was thinking of using NaN as a "missing value" in some numeric calculation code I'm working on, and I was hoping that using signaling NaN would allow me to catch a floating…
user123456
54
votes
4 answers

How does this float square root approximation work?

I found a rather strange but working square root approximation for floats; I really don't get it. Can someone explain me why this code works? float sqrt(float f) { const int result = 0x1fbb4000 + (*(int*)&f >> 1); return *(float*)&result; …
YSC
  • 38,212
  • 9
  • 96
  • 149
54
votes
4 answers

Why does division by zero in IEEE754 standard results in Infinite value?

I'm just curious, why in IEEE-754 any non zero float number divided by zero results in infinite value? It's a nonsense from the mathematical perspective. So I think that correct result for this operation is NaN. Function f(x) = 1/x is not defined…
Evgeny Lazin
  • 9,193
  • 6
  • 47
  • 83
53
votes
3 answers

Difference between Java's `Double.MIN_NORMAL` and `Double.MIN_VALUE`?

What's the difference between Double.MIN_NORMAL (introduced in Java 1.6) and Double.MIN_VALUE?
Cheok Yan Cheng
  • 47,586
  • 132
  • 466
  • 875
52
votes
3 answers

Why is Number.MAX_SAFE_INTEGER 9,007,199,254,740,991 and not 9,007,199,254,740,992?

ECMAScript 6's Number.MAX_SAFE_INTEGER supposedly represents the maximum numerical value JavaScript can store before issues arise with floating point precision. However it's a requirement that the number 1 added to this value must also be…
James Donnelly
  • 126,410
  • 34
  • 208
  • 218
48
votes
14 answers

32-bit to 16-bit Floating Point Conversion

I need a cross-platform library/algorithm that will convert between 32-bit and 16-bit floating point numbers. I don't need to perform math with the 16-bit numbers; I just need to decrease the size of the 32-bit floats so they can be sent over the…
Matt Fichman
  • 5,458
  • 4
  • 39
  • 59
47
votes
3 answers

Double vs float on the iPhone

I have just heard that the iphone cannot do double natively thereby making them much slower that regular float. Is this true? Evidence? I am very interested in the issue because my program needs high precision calculations, and I will have to…
John Smith
  • 12,491
  • 18
  • 65
  • 111
47
votes
2 answers

Coercing floating-point to be deterministic in .NET?

I've been reading a lot about floating-point determinism in .NET, i.e. ensuring that the same code with the same inputs will give the same results across different machines. Since .NET lacks options like Java's fpstrict and MSVC's fp:strict, the…
Asik
  • 21,506
  • 6
  • 72
  • 131
1
2
3
96 97