Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions
16
votes
0 answers

On 32-bit machines, atan2 is nondeterministic when I don't store its result in a variable. Why?

Consider this piece of C code: #include #include #include bool foo(int a, int b, int c, int d) { double P = atan2(a, b); double Q = atan2(c, d); return P < Q; } bool bar(int a, int b, int c, int d) { …
Maya
  • 1,490
  • 12
  • 24
16
votes
5 answers

What are the applications/benefits of an 80-bit extended precision data type?

Yeah, I meant to say 80-bit. That's not a typo... My experience with floating point variables has always involved 4-byte multiples, like singles (32 bit), doubles (64 bit), and long doubles (which I've seen referred to as either 96-bit or 128-bit).…
gnovice
  • 125,304
  • 15
  • 256
  • 359
16
votes
2 answers

Fused multiply add and default rounding modes

With GCC 5.3 the following code compield with -O3 -fma float mul_add(float a, float b, float c) { return a*b + c; } produces the following assembly vfmadd132ss %xmm1, %xmm2, %xmm0 ret I noticed GCC doing this with -O3 already in GCC…
Z boson
  • 32,619
  • 11
  • 123
  • 226
16
votes
1 answer

Why does GCC yield -nan and clang and intel yield +nan for 0.0/0.0?

When I was debugging code, I found that GCC and Clang both yield nan for 0.0/0.0 which is what I was expecting, but GCC yields an nan with the sign bit set to 1, while Clang sets it to 0 (in agreement with ICC, if I remember correctly). Now…
Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
16
votes
8 answers

Fast nearest power of 2 in JavaScript?

Is there any faster alternative to the following expression: Math.pow(2,Math.floor(Math.log(x)/Math.log(2))) That is, taking the closest (smaller) integer power of 2 of a double? I have such expression in a inner loop. I suspect it could be much…
MaiaVictor
  • 51,090
  • 44
  • 144
  • 286
16
votes
2 answers

Extreme numerical values in floating-point precision in R

Can somebody please explain me the following output. I know that it has something to do with floating point precision, but the order of magnitue (difference 1e308) surprises me. 0: high precision > 1e-324==0 [1] TRUE > 1e-323==0 [1] FALSE 1: very…
user3370602
16
votes
1 answer

Are there any whole numbers which the double cannot represent within the MIN/MAX range of a double?

I realize that whenever one is dealing with IEEE 754 doubles and floats, some numbers can't be represented especially when one tries to represent numbers with lots of digits after the decimal point. This is well understood but I was curious if…
Brett
  • 4,066
  • 8
  • 36
  • 50
16
votes
3 answers

What is long double on x86-64?

Someone told me that: Under x86-64, FP arithmetic is done with SSE, and therefore long double is 64 bits. But in the x86-64 ABI it says that: C type sizeof alignment AMD64 Architecture long double 16 16 80-bit extended (IEEE-754) See:…
Andrew Tomazos
  • 66,139
  • 40
  • 186
  • 319
16
votes
4 answers

How to simulate Single precision rounding with Doubles?

i had a problem where i was trying to reconstruct the the formula used in an existing system, a fairly simple formula of one input and one output: y = f(x) After a lot of puzzling, we managed to figure out the formula that fit our observed data…
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
15
votes
2 answers

On the float_precision argument to pandas.read_csv

The documentation for the argument in this post's title says: float_precision : string, default None Specifies which converter the C engine should use for floating-point values. The options are None for the ordinary converter, high for the…
kjo
  • 33,683
  • 52
  • 148
  • 265
15
votes
3 answers

Are there any modern platforms with non-IEEE C/C++ float formats?

I am writing a video game, Humm and Strumm, which requires a network component in its game engine. I can deal with differences in endianness easily, but I have hit a wall in attempting to deal with possible float memory formats. I know that modern…
Patrick Niedzielski
  • 1,194
  • 1
  • 8
  • 21
15
votes
3 answers

Why does table-based sin approximation literature always use this formula when another formula seems to make more sense?

The literature on computing the elementary function sin with tables refers to the formula: sin(x) = sin(Cn) * cos(h) + cos(Cn) * sin(h) where x = Cn + h, Cn is a constant for which sin(Cn) and cos(Cn) have been pre-computed and are available in a…
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
14
votes
2 answers

Handling money value, is it safe to divide a number by 100?

In the repository code, in a module developed by another team, I discovered that there is a conversion of a price from cents to euro, just dividing the number by 100. The code is in Javascript, so it uses the IEEE 754 standard. I know that is not…
14
votes
3 answers

Go float comparison

In order to compare two floats (float64) for equality in Go, my superficial understanding of IEEE 754 and binary representation of floats makes me think that this is a good solution: func Equal(a, b float64) bool { ba := math.Float64bits(a) …
augustzf
  • 2,385
  • 1
  • 16
  • 22
14
votes
1 answer

Is there any IEEE 754 standard implementations for Java floating point primitives?

I'm interested if Java is using IEEE 754 standard for implementing its floating point arithmetic. Here I saw this kind of thing in documentation: operation defined in IEEE 754-2008 As I understand positive side of IEEE 754 is to increase…
GROX13
  • 4,605
  • 4
  • 27
  • 41