Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions
31
votes
4 answers

How to perform round to even with floating point numbers

In regards to IEEE-754 single precision floating point, how do you perform round to nearest, where ties round to the nearest even digit in the required position (the default and by far the most common mode)? Basically I have the guard bit, round…
Veridian
  • 3,531
  • 12
  • 46
  • 80
31
votes
3 answers

In Scala, why is NaN not being picked up by pattern matching?

My method is as follows def myMethod(myDouble: Double): Double = myDouble match { case Double.NaN => ... case _ => ... } The IntelliJ debugger is showing NaN but this is not being picked up in my pattern matching. Are there possible…
deltanovember
  • 42,611
  • 64
  • 162
  • 244
31
votes
1 answer

Status of __STDC_IEC_559__ with modern C compilers

C99 added a macro __STDC_IEC_559__ which can be used to test if a compiler and standard library conform to the ISO/IEC/IEEE 60559 (or IEEE 754) standard. According to the answers for this…
Z boson
  • 32,619
  • 11
  • 123
  • 226
29
votes
3 answers

Get next smallest Double number

As part of a unit test, I need to test some boundary conditions. One method accepts a System.Double argument. Is there a way to get the next-smallest double value? (i.e. decrement the mantissa by 1 unit-value)? I considered using Double.Epsilon but…
Dai
  • 141,631
  • 28
  • 261
  • 374
28
votes
2 answers

How to convert a floating point number to its binary representation (IEEE 754) in Javascript?

What's the easiest way to convert a floating point number to its binary representation in Javascript? (e.g. 1.0 -> 0x3F800000). I have tried to do it manually, and this works to some extent (with usual numbers), but it fails for very big or very…
GameZelda
  • 824
  • 1
  • 7
  • 13
28
votes
5 answers

How many unique values are there between 0 and 1 of a standard float?

I guess another way of phrasing this question is what decimal place can you go to using a float that will only be between 0 and 1? I've tried to work it out by looking at the MSDN. Which says the precision is 7 digits. I thought that meant it could…
MatthewMcGovern
  • 3,466
  • 1
  • 19
  • 19
28
votes
3 answers

Why is "Divide by Zero" or any other exception not raised?

I have a double[] on which a LINQ operation is being performed: MD = MD.Select(n => n * 100 / MD.Sum()).ToArray(); In some cases, all elements of MD are 0 and then Sum is also zero. Then 0 * 100 = 0 / 0, but it is not giving a divide-by-zero…
Nikhil Agrawal
  • 47,018
  • 22
  • 121
  • 208
27
votes
8 answers

Ensuring C++ doubles are 64 bits

In my C++ program, I need to pull a 64 bit float from an external byte sequence. Is there some way to ensure, at compile-time, that doubles are 64 bits? Is there some other type I should use to store the data instead? Edit: If you're reading this…
Whatsit
  • 10,227
  • 11
  • 42
  • 41
27
votes
4 answers

Floating point comparison revisited

This topic has come up many times on StackOverflow, but I believe this is a new take. Yes, I have read Bruce Dawson's articles and What Every Computer Scientist Should Know About Floating-Point Arithmetic and this nice answer. As I understand it,…
Nemo
  • 70,042
  • 10
  • 116
  • 153
27
votes
8 answers

Next higher/lower IEEE double precision number

I am doing high precision scientific computations. In looking for the best representation of various effects, I keep coming up with reasons to want to get the next higher (or lower) double precision number available. Essentially, what I want to do…
Mark T
  • 3,464
  • 5
  • 31
  • 45
25
votes
5 answers

Uses for negative zero floating point value?

Consider the following C++ code: double someZero = 0; std::cout << 0 - someZero << '\n'; // prints 0 std::cout << -someZero << std::endl; // prints -0 The question arises: what is negative zero good for, and should it be defensively avoided (i.e.…
25
votes
4 answers

Algorithm to convert an IEEE 754 double to a string?

Many programming languages that use IEEE 754 doubles provide a library function to convert those doubles to strings. For example, C has sprintf, C++ has stringstream, Java has Double.toString, etc. Internally, how are these functions implemented? …
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
25
votes
4 answers

Reading 32 bit signed ieee 754 floating points from a binary file with python?

I have a binary file which is simple a list of signed 32 bit ieee754 floating point numbers. They are not separated by anything, and simply appear one after another until EOF. How would I read from this file and interpret them correctly as floating…
Razor Storm
  • 12,167
  • 20
  • 88
  • 148
25
votes
6 answers

Read/Write bytes of float in JS

Is there any way I can read bytes of a float value in JS? What I need is to write a raw FLOAT or DOUBLE value into some binary format I need to make, so is there any way to get a byte-by-byte IEEE 754 representation? And same question for writing of…
Michael Pliskin
  • 2,352
  • 4
  • 26
  • 42
25
votes
6 answers

Is it safe to assume floating point is represented using IEEE754 floats in C?

Floating point is implementation defined in the C. So there isn't any guarantees. Our code needs to be portable, we are discussing whether or not acceptable to use IEEE754 floats in our protocol. For performance reasons it would be nice if we don't…
Calmarius
  • 18,570
  • 18
  • 110
  • 157