Questions tagged [floating-point]

Floating point numbers are approximations of real numbers that can represent larger ranges than integers but use the same amount of memory, at the cost of lower precision. If your question is about small arithmetic errors (e.g. why does 0.1 + 0.2 equal 0.300000001?) or decimal conversion errors, please read the tag page before posting.

Many questions asked here about floating point math are about small inaccuracies in floating point arithmetic. To use the example from the excerpt, 0.1 + 0.2 might result in 0.300000001 instead of the expected 0.3. Errors like these are caused by the way floating point numbers are represented in computers' memory.

Integers are stored as exact values of the numbers they represent. Floating point numbers are stored as two values: a significand and an exponent. It is not possible to find a significand-exponent pair that matches every possible real number. As a result, some approximation and therefore inaccuracy is unavoidable.

Two commonly cited introductory-level resources about floating point math are What Every Computer Scientist Should Know About Floating-Point Arithmetic and the floating-point-gui.de.

FAQs:

Why 0.1 does not exist in floating point

Floating Point Math at https://0.30000000000000004.com/

Related tags:

ieee-754 (most used standard for floating-point computation)
- half-precision-float (16b float)
- single-precision (32b float)
- double-precision (64b float)
- extended-precision (80b float, usually)
- quadruple-precision (128b float)
types in c and c++
- double
- long-double
aspects of floating point numbers and computations

Programming languages where all numbers are double-precision (64b) floats:

javascript (see Number.MAX_SAFE_INTEGER on MDN and What is JavaScript's highest integer value that a Number can go to without losing precision?)
awk (see Expressions in awk in POSIX)
lua (up to 5.2 only, 5.3 introduced integers; see Changes in the Language in Lua 5.3 manual)

15006 questions

votes

2 answers

Why do these two float64s have different values?

Consider these two cases: fmt.Println(912 * 0.01) fmt.Println(float64(912) * 0.01) (Go Playground link) The second one prints 9.120000000000001, which is actually fine, I understand why that is happening. However, why does the first line print…

go floating-point floating-accuracy

asked Jan 07 '15 at 12:04

Attila O.

15,659
11
54
84

votes

3 answers

How do I find the largest integer less than x?

If x is 2.3, then math.floor(x) returns 2.0, the largest integer smaller than or equal to x (as a float.) How would I get i the largest integer strictly smaller than x (as a integer)? The best I came up with is: i = int(math.ceil(x)-1) Is there a…

python math floating-point

asked Jan 03 '15 at 19:22

pheon

2,867
3
26
33

votes

1 answer

Go atomic.AddFloat32()

I need a function to atomically add float32 values in Go. This is what came up with based on some C code I found: package atomic import ( "sync/atomic" "unsafe" "math" ) func AddFloat32(addr *float32, delta float32) (new float32) { …

go floating-point addition atomic

asked Dec 15 '14 at 20:14

B_old

1,141
3
12
26

votes

3 answers

How can I add floats together in different orders, and always get the same total?

Let's say I have three 32-bit floating point values, a, b, and c, such that (a + b) + c != a + (b + c). Is there a summation algorithm, perhaps similar to Kahan summation, that guarantees that these values can be summed in any order and always…

floating-point math numerical

asked Apr 24 '10 at 11:52

splicer

5,344
4
42
47

votes

1 answer

Strange behavior of program in GNU C++, using floating-point numbers

Look at this program: #include #include using namespace std; typedef pair coords; double dist(coords a, coords b) { return sqrt((a.first - b.first) * (a.first - b.first) + (a.second - b.second) *…

c++ gcc floating-point double compiler-optimization

asked Nov 12 '14 at 11:06

Denis Kirienko

votes

3 answers

How are upper and lower bounds for floating point numbers determined?

I have a question about the quote below (N3797, 3.9.1/8): The value representation of floating-point types is implementation-defined. As far as I understand it gives the implementation complete freedom in defining boundaries of floating point…

c++ c floating-point

asked Oct 19 '14 at 08:10

user2953119

votes

4 answers

How quickly check whether double fits in float? (Java)

Are there some arithmetic or bitwise operations that can check whether a double fits into a float without loss of precision. It should not only check that the double range is in the float range, but also that no mantissa bits get lost. Bye P.S.:…

java floating-point

asked Sep 29 '14 at 12:21

user502187

votes

1 answer

Do gcc's __float128 floating point numbers take the current rounding mode into account?

Do the arithmetic operations on gcc's __float128 floating point numbers take the current rounding mode into account? For instance, if using the C++11 function std::fesetenv, I change the rounding mode to FE_DOWNWARD, will results of arithmetic…

c++ c++11 gcc floating-point

asked Sep 23 '14 at 06:39

Walter Mascarenhas

votes

5 answers

Swift extract an Int, Float or Double value from a String (type-conversion)

Please could you help me here? I need to understand how to convert a String into an Int, Float or Double! This problem occurs when I'm trying to get the value from an UITextField and need this type of conversion! I used to do it like this: var…

string swift floating-point int double

asked Sep 07 '14 at 08:55

365Cases

votes

3 answers

How to read in one character at a time from a file in python?

I want to read in a list of numbers from a file as chars one char at a time to check what that char is, whether it is a digit, a period, a + or -, an e or E, or some other char...and then perform whatever operation I want based on that. How can I do…

python file floating-point

asked Sep 01 '14 at 19:15

Harley Jones

votes

2 answers

How to make InvariantCulture recognize a comma as a decimal separator?

How do I parse 1,2 with Single.Parse? The reason of asking is because, when I am using CultureInfo.InvariantCulture I don't get 1.2 as I would like, but rather 12. Shouldn't "Invariant Culture" ignore the culture? Consider the following…

c# parsing floating-point

asked Jun 23 '14 at 21:50

default

11,485
9
66
102

votes

2 answers

Why is there int but not float in Go?

In Go, there's the type int which may be equivalent to int32 or int64 depending on the system architecture. I can declare an integer variable without worrying about its size with: var x int Why isn't there the type float, which would be equivalent…

types floating-point go int

asked Jun 22 '14 at 18:37

cd1

15,908
12
46
47

votes

1 answer

Could not find an overload for '*' that accepts the supplied argument

I have converted a String to an Int by by using toInt(). I then tried multiplying it by 0.01, but I get an error that says Could not find an overload for '*' that accepts the supplied argument. Here is my code: var str: Int = 0 var pennyCount =…

floating-point integer double swift multiplication

asked Jun 05 '14 at 22:02

fairbanksdan

votes

1 answer

Will different math CPUs yield the same floating point results?

I'm developing on OS portable software that has unit tests that must work on Linux, UNIX, and Windows. Imagine this unit test that asserts that the IEEE single-precision floating point value 1.26743237e+015f is converted to a string: void…

c++ floating-point precision floating-accuracy floating-point-conversion

asked May 31 '14 at 07:52

user152949

votes

6 answers

Convert float to string without sprintf()

I'm coding for a microcontroller-based application and I need to convert a float to a character string, but I do not need the heavy overhead associated with sprintf(). Is there any eloquent way to do this? I don't need too much. I only need 2 digits…

c string memory floating-point printf

asked Apr 21 '14 at 05:13

audiFanatic

2,296
8
40
56

Prev 1 2 3

…

100 Next