Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions
-1
votes
2 answers

IEE754 - Floating point number representation

I am writing program which takes floating number from input and outputs hex representation of this number. What I did to solve it was: Divide number to whole and decimal part. Convert whole and decimal parts to binary representation. a) Whole…
Advent
  • 140
  • 15
-1
votes
1 answer

Can it guarantee the double value A/B is always equal to A/B?

As we know, because of the limited precision of double, the following two calculation may not give the exact the same value : A / B / C and A / ( B * C ) My question is even with the same two variable, A & B, can the compiler guarantee every time …
Xie Jason
  • 3
  • 1
-1
votes
2 answers

Is 0.0 / 0.0 a well-defined value?

Since 0.0 / 0.0 is mathematically undefined, the IEEE-754 floating-point standards reasonably define NaN as its result. Now, because unlike infinity, NaN is not a well-defined value, but a set of values, the question that whether 0.0 / 0.0 is a well…
-1
votes
1 answer

IEEE 754 to decimal. I'm not sure if I did this corectly

If a number in the format of the single precision is shown as hex 7F41F000, determine its numeric value. So I converted it to binary: 0111 1111 0100 0001 1111 0000 0000 0000 By looking at binary representation of given number I find following: s…
-1
votes
1 answer

Reinterpreting the bitstring of a unsigned int range [uint(a1) uint(a2)] to floating point range [float(a1) float(a2)] is this possible?

Hi when reinterpreting a 32-bit string of bits one could end up having a valid floating point number: uint: 1101004800, float: 20.000000 Now say i'm working with a static-analysis tool that defines operations on ranges of values instead of single…
-1
votes
1 answer

half precision muliplication seems to produce wrong result

First of all, IEEE754 half-precision floating point number uses 16 bits. It uses 1 bit sign, 5 bits exponent, and 10 bit mantissa. actual value can be calculated to be sign * 2^(exponent-15) * (1+mantisa/1024). I'm trying to run a image detection…
Chan Kim
  • 5,177
  • 12
  • 57
  • 112
-1
votes
1 answer

Encoding float constants as extremely long binary strings

Recently, I've been trying to implement the 15 tests for randomness described in NIST SP800-22. As a check of my function implementation, I have been running the examples that the NIST document provides for each of it's tests. Some of these tests…
Pat B.
  • 419
  • 3
  • 12
-1
votes
1 answer

Are there known safety issues caused by floating-point overflow?

IEEE754 standard seems to give floating-point overflow a pass by introducing the infinity representation. It seems to me that overflow is more tolerated in floats than in integers. My question is, how dangerous is floating-point overflow? Are…
zell
  • 9,830
  • 10
  • 62
  • 115
-1
votes
1 answer

Why can't we store exponent of IEEE floating point number without adding bias or converting it to 2's or 1's complement?

Why do we have to add bias or convert an exponent of IEEE floating point number in its 2's or 1's complement form ?? why can't we store it like this in single precision : 1.1 * 2^0 => 0 00000000 10000000000000000000000 instead of this: 1.1 * 2^ (0 +…
user7374714
-1
votes
1 answer

.NET Core float variable don't print decimals

I assign 2097151.3 to the float variable and the application prints only the integer part. Possible bug? public static void Main(string[] args) { float foo = 2097151.3F; Console.WriteLine(foo); // prints 2097151 …
Joe
  • 1
  • 1
-1
votes
3 answers

Denormalization IEEE

I'm working on a Digital Design project (Verilog) involving IEEE double precision floating point standard. I have a query regarding IEEE floating number representation. In IEEE floating point representation, the numbers are represented in normalized…
Displayname
  • 25
  • 1
  • 10
-1
votes
1 answer

Converting float to IEEE format in C

I need to transform a float to IEEE (the float is given by a scanf, and it has to come from the scanf otherwise it will drop and error) and I don't seem to get it to work. I tried using argc and argv and it was correct but my submitting platform…
-1
votes
1 answer

How do i convert a single-precision floating point number into decimal?

If this value 0010 0100 1001 0010 0100 1001 0010 0100 is a single precision floating point how do i convert it into decimal?
Roxas
  • 1
  • 1
-1
votes
1 answer

How to refine the result of a floating point division result?

I have an an algorithm for calculating the floating point square root divide using the newton-raphson algorith. My results are not fully accurate and sometimes off by 1 ulp. I was wondering if there is a refinement algorithm for floating point…
Veridian
  • 3,531
  • 12
  • 46
  • 80
-1
votes
1 answer

Mathematical algorithm returning NaN only in dot42 project

I have made a sample project using dot42 and have taken this C# code and placed it inside a library: var gcd = GetGreatCircleDistanceKm(52.0, -1.90, 21.0, 39.0); // returns NaN should be ~4915 public double GetGreatCircleDistanceKm(double…
sprocket12
  • 5,368
  • 18
  • 64
  • 133