Good way to approximate a floating point number

Question

I have a program that solves equations and sometimes the solutions x1 and x2 are numbers with a lot of decimal numbers. For example when Δ = 201 (Δ = discriminant) the square root gives me a floating point number.

I need a good approximation of that number because I also have a function that converts it into a fraction. So I thought to do this:

 Result := FormatFloat('0.#####', StrToFloat(solx1));

The solx1 is a double. In this way, the number '456,9067896' becomes '456,90679'.

My question is this: if I approximate in this way, the fraction of 456,9067896 will be correct (and the same) if I have 456,90679?

No, I've no idea what you are asking. Downvoting random answers doesn't achieve a right lot. — David Heffernan, Oct 26 '13 at 10:40
Alberto, sounds like you need to read this: "What Every Computer Scientist Should Know About Floating-Point Arithmetic" http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html In your case you probably want to use symbolic math libraries and bignum types, and rolling your own bignum types is not a job for the fainthearted. — Warren P, Oct 27 '13 at 20:16
@Warren Certainly does not sound to me like symbolic math is needed here. Good old finite precision floating point seems fine. — David Heffernan, Oct 30 '13 at 08:00
You don't think he's going to hit fractional values that are not precisely expressable in binary, causing him grief? — Warren P, Oct 30 '13 at 13:51

score 5 · Answer 1 · answered Oct 26 '13 at 10:22

5

the fraction of 456,9067896 will be correct (and the same) if I have 456,90679?

No, because 0.9067896 is unequal to 0.90679.

But why do you want to round the numbers? Just let them be as they are. Shorten them only for visual representation.

answered Oct 26 '13 at 10:22

NGLN

43,011
8
105
200

Sometimes I have numbers that ends with 1E15 for example, and they give me errors when I put them in my function. – Alberto Rossi Oct 26 '13 at 10:23
Don't convert the numbers to strings for calculations. – NGLN Oct 26 '13 at 10:25
@AlbertoRossi - this is now starting to sound like your real problem. Suggest you try posting a new question explaining the problem with your function and what, exactly, the errors are. – J... Oct 27 '13 at 10:27

score 3 · Answer 2 · answered Oct 26 '13 at 10:28

3

If you are worried about complete correctness of the result, you should not use floating point numbers at all, because floating points are, by definition, a rounding of real numbers. Only the first 5-6 decimal digits of a 32-bit floating point are generally reliable, the following ones are unreliable, due to machine error.

If you want complete precision, you should be using symbolic maths (rational numbers and symbolic representation for irrational/imaginary numbers).

answered Oct 26 '13 at 10:28

Giulio Franco

3,170
15
18

Nice answer, thank you. So at least I should approximate to 5 decimal digits? – Alberto Rossi Oct 26 '13 at 10:29
2

Why do you feel the need to approximate at all? Why do you think this answer suggests approximation to 5 decimal digits? I trust you realise that floating point IEEE754 as you use it is binary and not decimal. – David Heffernan Oct 26 '13 at 11:17
@AlbertoRossi you should not approximate at all. All the approximations you perform is further loss of precision, which adds to the loss of precision due to the conversion to floating point. – Giulio Franco Oct 26 '13 at 12:03
-1 What are you talking about? 32 bit 'double' values give from 15–17 significant decimal digits precision. http://en.wikipedia.org/wiki/Double-precision_floating-point_format – Arnaud Bouchez Oct 26 '13 at 14:01
@Arnaud He's talking about 32 bit single precision. You are talking about 64 bit double precision. – David Heffernan Oct 26 '13 at 14:06
@DavidHeffernan You are right - my mistake! But even `single` gives from 6 to 9 significant decimal digits precision - more than "5-6". If a decimal string with at most 6 significant decimal is converted to IEEE 754 single precision and then converted back to the same number of significant decimal, then the final string should match the original; and if an IEEE 754 single precision is converted to a decimal string with at least 9 significant decimal and then converted back to single, then the final number must match the original. – Arnaud Bouchez Oct 26 '13 at 15:13

Arnaud Bouchez · Accepted Answer · 2013-10-26T14:03:15.063

2

To compare two floating point values with a given precision, just use the SameValue() function from Math unit or its sibbling CompareValue().

if SameValue(456.9067896, 456.90679, 1E-5) then ...

You can specify the precision on which the comparision will take place.

Or you can use a currency value, which has fixed arithmetic precision of 4 digits. So, it won't have rounding issue any more. But you can not do all mathematic computation with it (huge or tiny numbers are not handled properly): its main use is for accounting computations.

You should better never use string representations to compare floats, since it may be very confusing, and do not have good rounding abilities.

edited Oct 26 '13 at 14:03

answered Oct 26 '13 at 13:57

Arnaud Bouchez

42,305
3
71
159

Personally I'd just as soon write abs(x-y) – David Heffernan Oct 26 '13 at 14:12
1

Now, what are you talking about in your discussion of Currency? You suggest that using currency avoids rounding. Your language is imprecise there. In any case, surely there's no use at all for decimal fixed point in numerical algos. – David Heffernan Oct 26 '13 at 14:22
2

@DavidHeffernan Simple `abs(x-y) – Arnaud Bouchez Oct 26 '13 at 15:20
That depends on the definition of "won't work". My definition would imply that `SameValue` does not work. It's implementation is somewhat bogus. – David Heffernan Oct 26 '13 at 15:27
@ArnaudBouchez SameValue uses relative error, which is abs(x-y)/abs(x) < eps. This is indeed the correct interpretation of the eps value, which indicated the smallest number such that 1 + eps != 1 – Giulio Franco Oct 26 '13 at 16:50
@Giulio You cannot say that relative error is correct. It's correct if you want relative error. It's incorrect if you want absolute error. And are you sure SameValue uses relative error? – David Heffernan Oct 26 '13 at 17:36
Is absolute error and relative error discussed in that What Every Computer Scientist Should Know About Floating Point paper? – Warren P Oct 27 '13 at 20:14
@WarrenP Sounds to be the case, [if you are talking about this paper](https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numerics/Double/paper.pdf) – Arnaud Bouchez Oct 28 '13 at 07:49

Good way to approximate a floating point number

3 Answers3