Anything related to converting a floating point number to and from other representations.
Questions tagged [floating-point-conversion]
290 questions
4
votes
1 answer
Without underflow and overflow, is there any 2 numbers, which A < B in decimal form, but A > B after converting to floating point?
For example, 0.00000000000000000000000000000000000000000000000000000000000000000000000000000011 is greater than 0.0000000000000000000000000000000000000000000000000000000000000000000000000000001 both in decimal and after converting to floating point…

ocomfd
- 4,010
- 2
- 10
- 19
4
votes
1 answer
Convert Integer to Float in Elisp
I am having trivial problems converting integer division to a floating point solution in Emacs Lisp 24.5.1.
(message "divide: %2.1f" (float (/ 1 2)))
"divide: 0.0"
I believe this expression is first calculating 1/2, finds it is 0 after truncating,…

c3ad-io
- 115
- 8
4
votes
2 answers
Extended precision floating point dangers in C#
I am writing a library for multiprecision arithmetic based on a paper I am reading. It is very important that I am able to guarantee the properties of floating point numbers I use. In particular, that they adhere to the IEEE 754 standard for double…

Void Star
- 2,401
- 4
- 32
- 57
4
votes
3 answers
java float conversion to string giving different result
why below java code prints different value (decimal part) of same float variable?
public static void main(String[] args) {
float f = 2001797.11f;
System.out.println(String.format("%013.2f", f));
System.out.println(Float.toString(f));
…

sapan
- 150
- 7
4
votes
2 answers
Converting SQL Geography Lat/Long to VARCHAR without loss of precision
I have a geography column called Location.
I need to SELECT the Location.Lat and Location.Long as a single VARCHAR result.
Using the following I am losing precision:
Query:
SELECT CONVERT(VARCHAR(MAX), Location.Lat) + ' ' + CONVERT(VARCHAR(MAX),…

KrustyGString
- 903
- 2
- 13
- 32
4
votes
2 answers
How to change a float64 number to uint64 in a right way?
package main
func main() {
var n float64 = 6161047830682206209
println(uint64(n))
}
The output will be:
6161047830682206208
It looks like that when float64 change to uint64, the fraction is discarded.

vzex
- 337
- 2
- 11
4
votes
1 answer
significant decimal digits of binary32 and binary64
Accourding to Wikipedia the binary32 format has from 6 to 9 significant decimal digits precision and 64 format has from 15 to 17.
I found that these significant decimal digits have been calculated using the Mantissa but i didn't get it how can…

alixander
- 426
- 1
- 7
- 18
4
votes
2 answers
Accurately predicting rounding error of cast between arbitrary floating-point formats
Let’s assume you have a float64_t number with an arbitrary value and you want to find out if said number can safely be down-cast to a float32_t with the restriction that the resulting rounding error must not exceed a given epsilon.
A possible…

Regexident
- 29,441
- 10
- 93
- 100
4
votes
1 answer
How to find shortest decimal number between two floating point numbers
I would like to write a function which finds the shortest decimal between value and the nearest two floating point values which are larger and smaller respectively.
For example the decimal number 0.1 has
decimal representation: 0.1
binary…

Beginner
- 5,277
- 6
- 34
- 71
4
votes
2 answers
Standard guarantees for using floating point arithmetic to represent integer operations
I am working on some code to be run on a very heterogeneous cluster. The program performs interval arithmetic using 3, 4, or 5 32 bit words (unsigned ints) to represent high precision boundaries for the intervals. It seems to me that representing…

Jack
- 169
- 7
4
votes
4 answers
Displaying fixed width floating point number before decimal
I want to display floating point number with fixed width before decimal. So far I managed to do it by taking the integer part, displaying desired width and filling with 0 using "%03d" before number value and then displaying the decimal part.
I want…

sujan_014
- 516
- 2
- 8
- 22
4
votes
2 answers
Round down floating point conversion in java
I have a line in my code much like below:
float rand = (float) Math.random();
Math.random() returns a double that is >=0.0 and <1.0. Unfortunately, the cast above may set rand to 1.0f if the double is too close to 1.0.
Is there a way to cast a…

Numeron
- 8,723
- 3
- 21
- 46
4
votes
4 answers
std::pow with integer parameters, comparing to an integer type
According to http://en.cppreference.com/w/cpp/numeric/math/pow , when std::pow is used with integer parameters, the result is promoted to a double.
My question is then the following:
How safe is to compare an integer type with the result of a…

vsoftco
- 55,410
- 12
- 139
- 252
4
votes
3 answers
Converting 4 raw bytes into 32-bit floating point
I'm trying to re-construct a 32-bit floating point value from an eeprom.
The 4 bytes in eeprom memory (0-4) are : B4 A2 91 4D
and the PC (VS Studio) reconstructs it correctly as 3.054199 * 10^8 (the floating point value I know should be…

ben
- 473
- 2
- 9
- 21
4
votes
1 answer
How to convert Uint8Array to a float in javascript?
I have an ArrayBuffer that I convert to a Uint8Array so that I can use traditional array access with square brackets and gather a subarray. Now that I have the correct set of 4 bytes that describe the 32-bit (little endian) floating point number, I…

tarabyte
- 17,837
- 15
- 76
- 117