Questions tagged [floating-point-conversion]

Anything related to converting a floating point number to and from other representations.

290 questions
4
votes
1 answer

Without underflow and overflow, is there any 2 numbers, which A < B in decimal form, but A > B after converting to floating point?

For example, 0.00000000000000000000000000000000000000000000000000000000000000000000000000000011 is greater than 0.0000000000000000000000000000000000000000000000000000000000000000000000000000001 both in decimal and after converting to floating point…
ocomfd
  • 4,010
  • 2
  • 10
  • 19
4
votes
1 answer

Convert Integer to Float in Elisp

I am having trivial problems converting integer division to a floating point solution in Emacs Lisp 24.5.1. (message "divide: %2.1f" (float (/ 1 2))) "divide: 0.0" I believe this expression is first calculating 1/2, finds it is 0 after truncating,…
c3ad-io
  • 115
  • 8
4
votes
2 answers

Extended precision floating point dangers in C#

I am writing a library for multiprecision arithmetic based on a paper I am reading. It is very important that I am able to guarantee the properties of floating point numbers I use. In particular, that they adhere to the IEEE 754 standard for double…
Void Star
  • 2,401
  • 4
  • 32
  • 57
4
votes
3 answers

java float conversion to string giving different result

why below java code prints different value (decimal part) of same float variable? public static void main(String[] args) { float f = 2001797.11f; System.out.println(String.format("%013.2f", f)); System.out.println(Float.toString(f)); …
sapan
  • 150
  • 7
4
votes
2 answers

Converting SQL Geography Lat/Long to VARCHAR without loss of precision

I have a geography column called Location. I need to SELECT the Location.Lat and Location.Long as a single VARCHAR result. Using the following I am losing precision: Query: SELECT CONVERT(VARCHAR(MAX), Location.Lat) + ' ' + CONVERT(VARCHAR(MAX),…
4
votes
2 answers

How to change a float64 number to uint64 in a right way?

package main func main() { var n float64 = 6161047830682206209 println(uint64(n)) } The output will be: 6161047830682206208 It looks like that when float64 change to uint64, the fraction is discarded.
4
votes
1 answer

significant decimal digits of binary32 and binary64

Accourding to Wikipedia the binary32 format has from 6 to 9 significant decimal digits precision and 64 format has from 15 to 17. I found that these significant decimal digits have been calculated using the Mantissa but i didn't get it how can…
4
votes
2 answers

Accurately predicting rounding error of cast between arbitrary floating-point formats

Let’s assume you have a float64_t number with an arbitrary value and you want to find out if said number can safely be down-cast to a float32_t with the restriction that the resulting rounding error must not exceed a given epsilon. A possible…
Regexident
  • 29,441
  • 10
  • 93
  • 100
4
votes
1 answer

How to find shortest decimal number between two floating point numbers

I would like to write a function which finds the shortest decimal between value and the nearest two floating point values which are larger and smaller respectively. For example the decimal number 0.1 has decimal representation: 0.1 binary…
Beginner
  • 5,277
  • 6
  • 34
  • 71
4
votes
2 answers

Standard guarantees for using floating point arithmetic to represent integer operations

I am working on some code to be run on a very heterogeneous cluster. The program performs interval arithmetic using 3, 4, or 5 32 bit words (unsigned ints) to represent high precision boundaries for the intervals. It seems to me that representing…
4
votes
4 answers

Displaying fixed width floating point number before decimal

I want to display floating point number with fixed width before decimal. So far I managed to do it by taking the integer part, displaying desired width and filling with 0 using "%03d" before number value and then displaying the decimal part. I want…
sujan_014
  • 516
  • 2
  • 8
  • 22
4
votes
2 answers

Round down floating point conversion in java

I have a line in my code much like below: float rand = (float) Math.random(); Math.random() returns a double that is >=0.0 and <1.0. Unfortunately, the cast above may set rand to 1.0f if the double is too close to 1.0. Is there a way to cast a…
Numeron
  • 8,723
  • 3
  • 21
  • 46
4
votes
4 answers

std::pow with integer parameters, comparing to an integer type

According to http://en.cppreference.com/w/cpp/numeric/math/pow , when std::pow is used with integer parameters, the result is promoted to a double. My question is then the following: How safe is to compare an integer type with the result of a…
vsoftco
  • 55,410
  • 12
  • 139
  • 252
4
votes
3 answers

Converting 4 raw bytes into 32-bit floating point

I'm trying to re-construct a 32-bit floating point value from an eeprom. The 4 bytes in eeprom memory (0-4) are : B4 A2 91 4D and the PC (VS Studio) reconstructs it correctly as 3.054199 * 10^8 (the floating point value I know should be…
ben
  • 473
  • 2
  • 9
  • 21
4
votes
1 answer

How to convert Uint8Array to a float in javascript?

I have an ArrayBuffer that I convert to a Uint8Array so that I can use traditional array access with square brackets and gather a subarray. Now that I have the correct set of 4 bytes that describe the 32-bit (little endian) floating point number, I…
tarabyte
  • 17,837
  • 15
  • 76
  • 117