1

I understand that floating-point calculation is not accurate due to its nature. I'm trying to find out the best library/way to do multi-precision ration comparison. I'm comparing Fraction, mpq and mpfr. The later two are from gmpy2 library. The first one is from fractions package. I'm using python3.3

This is the script I used to compare. Not very well written, a very simple one.

from fractions import Fraction
from gmpy2 import mpq, mpfr
import time

# This script compares gmpy2 library and Fraction library

total_pass_mpq = 0
total_pass_mpfr = 0
total_pass_frc = 0

a = mpq("-3.232429")
a_ = Fraction("-3.232429")
a__ = mpfr("-3.232429")
if str(float(a)) == "-3.232429":
    total_pass_mpq +=1
if str(float(a_)) == "-3.232429":
    total_pass_frc += 1
if str(float(a__)) == "-3.232429":
    total_pass_mpfr += 1

b = mpq("604.08")
c = mpq("1.979")
b_ = Fraction("604.08")
c_ = Fraction("1.979")
b__ = mpfr("604.08")
c__ = mpfr("1.979")
if str(float(b*c)) == "1195.47432":
    total_pass_mpq += 1
if str(float(b_*c_)) == "1195.47432":
    total_pass_frc += 1
if str(float(b__*c__)) == "1195.47432":
    total_pass_mpfr += 1

d = mpq(604.08)
e = mpq(1.979)
d_ = Fraction(604.08)
e_ = Fraction(1.979)
d__ = mpfr(604.08)
e__ = mpfr(1.979)
if str(float(d*e)) == "1195.47432":
    total_pass_mpq += 1
if str(float(d_*e_)) == "1195.47432":
    total_pass_frc += 1
if str(float(d__*e__)) == "1195.47432":
    total_pass_mpfr += 1

f = mpq(-3.232429)
f_ = Fraction(-3.232429)
f__ = mpfr(-3.232429)
if str(float(f)) == "-3.232429":
    total_pass_mpq +=1
if str(float(f_)) == "-3.232429":
    total_pass_frc += 1
if str(float(f__)) == "-3.232429":
    total_pass_mpfr +=1

g = mpq(503.79)
g_ = Fraction(503.79)
g__ = mpfr(503.79)
h = mpq(0.07)
h_ = Fraction(0.07)
h__ = mpfr(0.07)
if str(float(g*(1+h))) == "539.0553":
    total_pass_mpq += 1
if str(float(g_*(1+h_))) == "539.0553":
    total_pass_frc += 1
if str(float(g__*(1+h__))) == "539.0553":
    total_pass_mpfr += 1

print("Total passed mpq: " + str(total_pass_mpq))
print("Total passed Fraction: " + str(total_pass_frc))
print("Total passed mpfr: " + str(total_pass_mpfr))

start_mpq = time.time()
for i in range(0, 50000):
    y = mpq(0.32329)
    z = mpq(-1)
    yz = y*z
end_mpq = time.time()
print("Time for executing mpq: " + str(end_mpq - start_mpq))

start_frc = time.time()
for j in range(0, 50000):
    y = Fraction(0.32329)
    z = Fraction(-1)
    yz_ = y*z
end_frc = time.time()
print("Time for executing frc: " + str(end_frc - start_frc))

start_frc_2 = time.time()
for j_ in range(0, 50000):
    y = Fraction(0.32329)
    z = Fraction(-1)
    yz_2 = y*z
end_frc_2 = time.time()
print("Time for executing frc str: " + str(end_frc_2 - start_frc_2))

start_mpfr = time.time()
for k in range(0, 50000):
    y = mpfr(0.32329)
    z = mpfr(-1)
    yz__ = y*z
end_mpfr = time.time()
print("Time for executing mpfr: " + str(end_mpfr - start_mpfr))

start_mpfr_2 = time.time()
for k_ in range(0, 50000):
    y = mpfr("0.32329")
    z = mpfr("-1")
    yz__2 = y*z
end_mpfr_2 = time.time()
print("Time for executing mpfr str: " + str(end_mpfr_2 - start_mpfr_2))

This is the result:

Total passed mpq: 3
Total passed Fraction: 5
Total passed mpfr: 4
Time for executing mpq: 0.04700875282287598
Time for executing frc: 2.1327619552612305
Time for executing frc str: 2.0934295654296875
Time for executing mpfr: 0.05441713333129883
Time for executing mpfr str: 0.12844634056091309

So basically I've got the result that Fraction is the most accurate one, but it's super slow. For this question, I wanted to ask,

  1. is there any other case that you think I should also try?
  2. any other library?
  3. If speed matters, is there a way to improve precision using gmpy2 library?
pythonician_plus_plus
  • 1,244
  • 3
  • 15
  • 38
  • 1
    `mpq` and `Fraction` should be equal (effectively infinite) precision, since they're both storing arbitrary precision `int`s as numerator and denominator. I suspect your tests are poorly designed (relying on floating point representation too much) if they're claiming the two of them don't have matching precision. `mpq` should basically be a faster version of `Fraction`, that's all. In both cases though, initializing from a `float` is asking for trouble; `float`s have representation issues that different rational number types might convert differently. – ShadowRanger Oct 18 '16 at 04:26
  • @ShadowRanger For example, `float(mpq("-3.232429"))` gives me `-3.2324289999999998` while `float(Fraction("-3.232429"))` gives me `-3.232429`. Do you think this is expected? – pythonician_plus_plus Oct 18 '16 at 04:37
  • You're assuming the goal of the libraries is to handle floating point math. It's not. As soon as you leave the realm of rational numbers, the libraries are outside their intended use case. I don't know precisely where the "error" is creeping in, but the conversion back from `mpq` to `float` could differ because of too _much_ precision in GMP (where floating point error varies with precision; might help to convert yourself `mympq.numerator / mympq.denominator` to have Python do it), vs. too little. If you're converting to and from `float`, you're not doing rational number math properly. – ShadowRanger Oct 18 '16 at 04:50
  • @ShadowRanger What about this: ```>>> from gmpy2 import sub >>> sub(mpq("-3.232429"), mpq(-3.232429)) >>> mpq(7697,35184372088832000000)``` Why it's not zero? Isn't it inaccurate? – pythonician_plus_plus Oct 18 '16 at 04:57
  • That's true for `Fraction` too, because as I've already said YOU CAN'T USE `float` WITH RATIONAL NUMBER LIBRARIES! THAT'S NOT WHAT THEY'RE FOR! One of those becomes the fraction `-3232429/1000000`, the other becomes `-7278783019950793/2251799813685248`, because that's the actual ratio represented by the `float` value -3.232429. The string is exact, and both `Fraction` and `mpq` use it to get the precise "divide by power of 10" representation. A `float` is going to be converted less intuitively, because you're giving them a number that isn't actually the rational you think it is. – ShadowRanger Oct 18 '16 at 05:02
  • Basically, all of your errors are because `float`s don't behave the way you want. If you give an exact library an inexact value, the best it can do is accurately represent the inexact value you gave it. That's your fault. (Side-note: `mpq` should support `-`, no need for `sub` function) – ShadowRanger Oct 18 '16 at 05:05
  • @ShadowRanger Ok, you are right. That's same for Fraction as well. But for `mpq("-3.232429")`, it's wrapped by mpq and I shouldn't convert it to or from`float` based on what you said, how can I use it? For example, print it out or send it to a REST call? – pythonician_plus_plus Oct 18 '16 at 05:07

1 Answers1

3

float(mpq) calls the GMP library function mpq_get_q. I checked the GMP source and mpq_get_d rounds the intermediate result towards 0. It does not calculate a correctly rounded result. (In this case, correctly rounded implies round to nearest with ties to even.) So it will occasionally be different than float(Fraction).

The GMP library is not optimized for floating point calculations. To get correctly rounded floating values, you should use the MFPR libary (aka the mpfr type in gmpy2).

The most accurate way to convert an mpq to a float is to convert it to an mpfr first. To avoid double-rounding, you should convert from mpq to mpfr with a precision of exactly 53 bits. So float(mpfr(mpq, 53)). (The default precision is currently 53 bits but that might change in the future. Specifying the desired precision, or ensuring the default context's precision is set to 53, is recommended.) This change makes mpq and Fraction return the same results in your example.

There is still one mpfr result that is different. That is just inherent in the fact that intermediate mpfrcalculations are rounding to the current precision (53 bits in this case).

Update to answer a question by @mattsun.

Why does mpfr("503.79")*(mpfr("1")+mpfr("0.07")) not equal "539.0553"?

Both Python's float type and gmpy2's mpfr type use a binary, or radix-2, representation. We normally use decimal, or radix-10, representation when we work with numbers. Just like 1/3 cannon be represented exactly in decimal arithmetic, most decimal numbers cannot be represented exactly with a binary representation. The calculations are done with values that are close, but not exactly equal, to the values given. The errors can accumulate, and the result will be just a little different than your expected value.

There are two options:

1) Format the string to the desired decimal format.

2) Use the decimal library.

Disclaimer: I maintain gmpy2.

casevh
  • 11,093
  • 1
  • 24
  • 35
  • Thanks for you answer. I'm not sure I understood this correctly. Given this expression, `mpfr("503.79")*(mpfr("1")+mpfr("0.07"))`, it gives me result `mpfr('539.0553000000001')`. However, I wanted to get value "539.0553". How? – pythonician_plus_plus Oct 19 '16 at 01:45
  • Regardless above example, what you said, convert mpq to mpfr(53bit) makes sense and is useful. – pythonician_plus_plus Oct 19 '16 at 01:56