Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions
8
votes
4 answers

Explain a surprising parity in the rounding direction of apparent ties in the interval [0, 1]

Consider the collection of floating-point numbers of the form 0.xx5 between 0.0 and 1.0: [0.005, 0.015, 0.025, 0.035, ..., 0.985, 0.995] I can make a list of all 100 such numbers easily in Python: >>> values = [n/1000 for n in range(5, 1000,…
Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
8
votes
1 answer

Probability that a formula fails in IEEE 754

On my computer, I can check that (0.1 + 0.2) + 0.3 == 0.1 + (0.2 + 0.3) evaluates to False. More generally, I can estimate that the formula (a + b) + c == a + (b + c) fails roughly 17% of the time when a,b,c are chosen uniformly and independently…
hilberts_drinking_problem
  • 11,322
  • 3
  • 22
  • 51
8
votes
5 answers

Is there a Dart function to convert List to Double?

I'm getting a Bluetooth Characteristic from a Bluetooth controller with flutter blue as a List. This characteristic contains weight measurement of a Bluetooth scale. Is there a function to convert this list of ints to a Double? I tried to find some…
nvano
  • 337
  • 6
  • 13
8
votes
3 answers

Floating point number in JavaScript (IEEE 754)

If I understand correctly, JavaScript numbers are always stored as double precision floating point numbers, following the international IEEE 754 standard. Which mean it uses 52 bits for fraction significand. But in the picture above, it seems like…
hungneox
  • 9,333
  • 12
  • 49
  • 66
8
votes
1 answer

What exactly does (1.0e300 + pow(2.0, -30.0) > 1.0) do in STDC?

I have come across a function which computes atan(x) (the source is here). Reducing it to the core of my question and slightly reformatting it, they have something like that: static const double one = 1.0, huge =…
Binarus
  • 4,005
  • 3
  • 25
  • 41
8
votes
4 answers

Convert from the IBM floating point to the IEEE floating point standard and vice versa in C#

I was looking for a way to convert IEEE floating point numbers to IBM floating point format for a old system we are using. Is there a general formula we can use in C# to this end?
aiw
  • 129
  • 1
  • 2
  • 7
8
votes
1 answer

Java/C: OpenJDK native tanh() implementation wrong?

I was digging through some of the Java Math functions native C source code. Especially tanh(), as I was curious to see how they implemented that one. However, what I found surprised me: double tanh(double x) { ... if (ix < 0x40360000) { …
Martijn Courteaux
  • 67,591
  • 47
  • 198
  • 287
8
votes
3 answers

Is there any way to see a number in it's 64 bit float IEEE754 representation

Javascript stores all numbers as double-precision 64-bit format IEEE 754 values according to the spec: The Number type has exactly 18437736874454810627 (that is, 264−253+3) values, representing the double-precision 64-bit format IEEE 754 values…
Max Koretskyi
  • 101,079
  • 60
  • 333
  • 488
8
votes
2 answers

For any finite floating point value, is it guaranteed that x - x == 0?

Floating point values are inexact, which is why we should rarely use strict numerical equality in comparisons. For example, in Java this prints false (as seen on ideone.com): System.out.println(.1 + .2 == .3); // false Usually the correct way to…
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
8
votes
1 answer

Convert IEEE 754 to decimal floating point

I have what I think it is an IEEE754 with single or double precision (not sure) and I'd like to convert it to decimal on PHP. Given 4 hex value (which might be in little endian format, so basically reversed order) 4A,5B,1B,05 I need to convert it to…
matt
  • 2,312
  • 5
  • 34
  • 57
8
votes
2 answers

Minimum and maximum of signed zero

I am concerned about the following cases min(-0.0,0.0) max(-0.0,0.0) minmag(-x,x) maxmag(-x,x) According to Wikipedia IEEE 754-2008 says in regards to min and max The min and max operations are defined but leave some leeway for the case where the…
Z boson
  • 32,619
  • 11
  • 123
  • 226
8
votes
3 answers

Why are numbers with many significant digits handled differently in C# and JavaScript?

If JavaScript's Number and C#'s double are specified the same (IEEE 754), why are numbers with many significant digits handled differently? var x = (long)1234123412341234123.0; // 1234123412341234176 - C# var x = 1234123412341234123.0; //…
Hans Malherbe
  • 2,988
  • 24
  • 19
8
votes
4 answers

What is the risk of numerical instabilities when predividing denominators?

Supposing I want to divide one number into many. a /= x; b /= x; c /= x; ... Since multiplication is faster, the temptation is to do this tmp = 1.0f / x; a *= tmp; b *= tmp; c *= tmp; ... 1) Is this guaranteed to produce identical answers? I…
spraff
  • 32,570
  • 22
  • 121
  • 229
8
votes
1 answer

Why do we need IEEE 754 remainder?

I just read this topic (especially the last comments). Then I was wondering, why we actually need this was of giving the remainder. But it seems, that not many people "on google" were interested in that before...
TheTrowser
  • 363
  • 4
  • 14
8
votes
5 answers

0.0 and -0.0 in Java (IEEE 754)

Java is totally compatible with IEEE 754 right? But I'm confused about how java decide the sign of float point addition and substraction. Here is my test result: double a = -1.5; double b = 0.0; double c = -0.0; System.out.println(b * a); …
MagicFingr
  • 479
  • 6
  • 20