Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

Wikipedia on IEEE 754 (2008)
ieee.org documentation
https://en.wikipedia.org/wiki/Single-precision_floating-point_format aka binary32, usually called float or real4. Nice diagrams of the bit-pattern, and range over which it can represent every integer exactly, and so on.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format usually called double or real8
Algorithm to convert an IEEE 754 double to a string? including the recent Ryū: fast float-to-string conversion

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions

votes

4 answers

How to perform round to even with floating point numbers

In regards to IEEE-754 single precision floating point, how do you perform round to nearest, where ties round to the nearest even digit in the required position (the default and by far the most common mode)? Basically I have the guard bit, round…

asked Jan 24 '12 at 04:12

Veridian

3,531
12
46
80

votes

3 answers

In Scala, why is NaN not being picked up by pattern matching?

My method is as follows def myMethod(myDouble: Double): Double = myDouble match { case Double.NaN => ... case _ => ... } The IntelliJ debugger is showing NaN but this is not being picked up in my pattern matching. Are there possible…

scala pattern-matching ieee-754 nan

asked Aug 02 '11 at 06:46

deltanovember

42,611
64
162
244

votes

1 answer

Status of __STDC_IEC_559__ with modern C compilers

C99 added a macro __STDC_IEC_559__ which can be used to test if a compiler and standard library conform to the ISO/IEC/IEEE 60559 (or IEEE 754) standard. According to the answers for this…

c gcc floating-point clang ieee-754

asked Jul 02 '15 at 10:18

Z boson

32,619
11
123
226

votes

3 answers

Get next smallest Double number

As part of a unit test, I need to test some boundary conditions. One method accepts a System.Double argument. Is there a way to get the next-smallest double value? (i.e. decrement the mantissa by 1 unit-value)? I considered using Double.Epsilon but…

c# double ieee-754 floating-point-precision epsilon

asked Mar 11 '13 at 03:18

Dai

141,631
28
261
374

votes

2 answers

How to convert a floating point number to its binary representation (IEEE 754) in Javascript?

What's the easiest way to convert a floating point number to its binary representation in Javascript? (e.g. 1.0 -> 0x3F800000). I have tried to do it manually, and this works to some extent (with usual numbers), but it fails for very big or very…

javascript binary floating-point ieee-754

asked Jun 22 '10 at 19:56

GameZelda

votes

5 answers

How many unique values are there between 0 and 1 of a standard float?

I guess another way of phrasing this question is what decimal place can you go to using a float that will only be between 0 and 1? I've tried to work it out by looking at the MSDN. Which says the precision is 7 digits. I thought that meant it could…

c# floating-point ieee-754

asked Jul 30 '13 at 14:27

MatthewMcGovern

3,466
1
19
19

votes

3 answers

Why is "Divide by Zero" or any other exception not raised?

I have a double[] on which a LINQ operation is being performed: MD = MD.Select(n => n * 100 / MD.Sum()).ToArray(); In some cases, all elements of MD are 0 and then Sum is also zero. Then 0 * 100 = 0 / 0, but it is not giving a divide-by-zero…

c# .net linq ieee-754

asked Apr 20 '12 at 08:03

Nikhil Agrawal

47,018
22
121
208

votes

8 answers

Ensuring C++ doubles are 64 bits

In my C++ program, I need to pull a 64 bit float from an external byte sequence. Is there some way to ensure, at compile-time, that doubles are 64 bits? Is there some other type I should use to store the data instead? Edit: If you're reading this…

c++ types precision compiler-construction ieee-754

asked Apr 15 '09 at 15:44

Whatsit

10,227
11
42
41

votes

4 answers

Floating point comparison revisited

This topic has come up many times on StackOverflow, but I believe this is a new take. Yes, I have read Bruce Dawson's articles and What Every Computer Scientist Should Know About Floating-Point Arithmetic and this nice answer. As I understand it,…

c++ floating-point language-lawyer ieee-754

asked Dec 18 '12 at 19:43

Nemo

70,042
10
116
153

votes

8 answers

Next higher/lower IEEE double precision number

I am doing high precision scientific computations. In looking for the best representation of various effects, I keep coming up with reasons to want to get the next higher (or lower) double precision number available. Essentially, what I want to do…

floating-point double precision ieee-754

asked Aug 07 '09 at 16:59

Mark T

3,464
5
31
45

votes

5 answers

Uses for negative zero floating point value?

Consider the following C++ code: double someZero = 0; std::cout << 0 - someZero << '\n'; // prints 0 std::cout << -someZero << std::endl; // prints -0 The question arises: what is negative zero good for, and should it be defensively avoided (i.e.…

c++ floating-point ieee-754

asked Oct 30 '11 at 18:36

catfish_deluxe_call_me_cd

votes

4 answers

Algorithm to convert an IEEE 754 double to a string?

Many programming languages that use IEEE 754 doubles provide a library function to convert those doubles to strings. For example, C has sprintf, C++ has stringstream, Java has Double.toString, etc. Internally, how are these functions implemented? …

string algorithm language-agnostic floating-point ieee-754

asked Aug 22 '11 at 21:44

templatetypedef

362,284
104
897
1,065

votes

4 answers

Reading 32 bit signed ieee 754 floating points from a binary file with python?

I have a binary file which is simple a list of signed 32 bit ieee754 floating point numbers. They are not separated by anything, and simply appear one after another until EOF. How would I read from this file and interpret them correctly as floating…

python parsing floating-point binaryfiles ieee-754

asked Jun 08 '11 at 22:25

Razor Storm

12,167
20
88
148

votes

6 answers

Read/Write bytes of float in JS

Is there any way I can read bytes of a float value in JS? What I need is to write a raw FLOAT or DOUBLE value into some binary format I need to make, so is there any way to get a byte-by-byte IEEE 754 representation? And same question for writing of…

javascript serialization floating-point math ieee-754

asked Dec 10 '10 at 23:03

Michael Pliskin

2,352
4
26
42

votes

6 answers

Is it safe to assume floating point is represented using IEEE754 floats in C?

Floating point is implementation defined in the C. So there isn't any guarantees. Our code needs to be portable, we are discussing whether or not acceptable to use IEEE754 floats in our protocol. For performance reasons it would be nice if we don't…

c floating-point ieee-754

asked Aug 12 '15 at 13:44

Calmarius

18,570
18
110
157

Prev 1 2 3

…

96 97 Next