Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions
14
votes
2 answers

Can't get 0.30000000000000004 by calculating

When I run in the console 0.1 + 0.2 the result is 0.30000000000000004. So I tried to calculate it myself. Here are the steps I've taken. 1) Represent 0.1 as IEEE754 double: 0.1 = 0 01111111011 1001100110011001100110011001100110011001100110011010 2)…
Max Koretskyi
  • 101,079
  • 60
  • 333
  • 488
14
votes
2 answers

Why the IEEE-754 exponent bias used in this C code is 126.94269504 instead of 127?

The following C function is from fastapprox project. static inline float fasterlog2 (float x) { union { float f; uint32_t i; } vx = { x }; float y = vx.i; y *= 1.1920928955078125e-7f; return y - 126.94269504f; } Could some experts here…
Astaroth
  • 2,241
  • 18
  • 35
14
votes
7 answers

Do-s and Don't-s for floating point arithmetic?

What are some good do-s and don't-s for floating point arithmetic (IEEE754 in case there's confusion) to ensure good numerical stability and high accuracy in your results? I know a few like don't subtract quantities of similar magnitude, but I'm…
gct
  • 14,100
  • 15
  • 68
  • 107
14
votes
4 answers

CLR JIT optimizations violates causality?

I was writing an instructive example for a colleague to show him why testing floats for equality is often a bad idea. The example I went with was adding .1 ten times, and comparing against 1.0 (the one I was shown in my introductory numerical…
Gobiner
  • 195
  • 7
14
votes
2 answers

flush-to-zero behavior in floating-point arithmetic

While, as far as I remember, IEEE 754 says nothing about a flush-to-zero mode to handle denormalized numbers faster, some architectures offer this mode (e.g. http://docs.sun.com/source/806-3568/ncg_lib.html ). In the particular case of this…
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
14
votes
2 answers

Is SSE floating-point arithmetic reproducible?

The x87 FPU is notable for using an internal 80-bit precision mode, which often leads to unexpected and unreproducible results across compilers and machines. In my search for reproducible floating-point math on .NET, I discovered that both major…
Asik
  • 21,506
  • 6
  • 72
  • 131
14
votes
4 answers

Incorrect floating point behavior

When I run the below C++ program in a 32-bit powerpc kernel which supports software floating emulation (hardware floating point disabled), I get a incorrect conditional evaluation. Can some tell me what's the potential problem here? #include…
rajachan
  • 795
  • 1
  • 6
  • 20
14
votes
2 answers

Math.pow with negative numbers and non-integer powers

The ECMAScript specification for Math.pow has the following peculiar rule: If x < 0 and x is finite and y is finite and y is not an integer, the result is NaN. (http://es5.github.com/#x15.8.2.13) As a result Math.pow(-8, 1 / 3) gives NaN rather…
Nathan Wall
  • 10,530
  • 4
  • 24
  • 47
14
votes
1 answer

Why is pow(-infinity, positive non-integer) +infinity?

C99 annex F (IEEE floating point support) says this: pow(−∞, y) returns +∞ for y > 0 and not an odd integer. But, say, (−∞)0.5 actually has the imaginary values ±∞i, not +∞. C99’s own sqrt(−∞) returns a NaN and generates a domain error as…
Chortos-2
  • 995
  • 7
  • 20
13
votes
5 answers

How many distinct floating-point numbers in a specific range?

How many rep­re­sentable floats are there be­tween 0.0 and 0.5? And how many representable floats are there between 0.5 and 1.0? I'm more interested in the math behind it, and I need the answer for floats and doubles.
Arlen
  • 6,641
  • 4
  • 29
  • 61
13
votes
1 answer

subnormal IEEE 754 floating point numbers support on iOS ARM devices (iPhone 4)

While porting an application from Linux x86 to iOS ARM (iPhone 4), I've discovered a difference in behavior on floating point arithmetics and small values. 64bits floating point numbers (double) smaller than [+/-]2.2250738585072014E-308 are called…
Yann Droneaud
  • 5,277
  • 1
  • 23
  • 39
13
votes
4 answers

Are JS engines allowed to change the bits of a NaN?

In JavaScript, the NaN value can be represented by a wide range of 64-bit doubles internally. Specifically, any double with the following bitwise representation: x111 1111 1111 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx Is…
MaiaVictor
  • 51,090
  • 44
  • 144
  • 286
13
votes
6 answers

IEEE-754 compliant round-half-to-even

The C standard library provides the round, lround, and llround family of functions in C99. However, these functions are not IEEE-754 compliant, because they do not implement the "banker's rounding" of half-to-even as mandated by IEEE. Half-to-even…
68ejxfcj5669
  • 525
  • 3
  • 11
13
votes
3 answers

What is the result of comparing a number with NaN?

Consider for example bool fun (double a, double b) { return a < b; } What will fun return if any of the arguments are NaN? Is this undefined / implementation defined behavior? What happens with the other relational operators and the equality…
Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
13
votes
4 answers

What is (+0)+(-0) by IEEE floating point standard?

Am I right that any arithmetic operation on any floating numbers is unambiguously defined by IEEE floating point standard? If yes, just for curiosity, what is (+0)+(-0)? And is there a way to check such things in practice, in C++ or other commonly…
se0808
  • 556
  • 3
  • 18