I'm using IPython to time some short examples and then I'll try to explain the results.
from gmpy2 import mpfr, get_context
get_context().precision=1000
a=mpfr(1);b=mpfr(3)
%timeit a/b
1000000 loops, best of 3: 669 ns per loop
%timeit a/3
1000000 loops, best of 3: 464 ns per loop
get_context().precision=10000
a=mpfr(1);b=mpfr(3)
%timeit a/b
100000 loops, best of 3: 12.9 µs per loop
%timeit a/3
1000000 loops, best of 3: 1.33 µs per loop
get_context().precision=100000
a=mpfr(1);b=mpfr(3)
%timeit a/b
1000 loops, best of 3: 505 µs per loop
%timeit a/3
100000 loops, best of 3: 8.13 µs per loop
Notice that as the precision increases, the running time for a/b
increases more rapidly than a/3
. When calculating a/b
, MPFR uses the full precision of both values and the running time is (roughly) O(n * ln(n)). When calculating a/3
, MPFR uses a short, but exact, representation of 3 and the running time is (roughly) O(n). This explains why a/b
is slower than a/3
for high precision. (n is the length of a
in bits.)
When Python calculates scale//3
, it takes advantage of the fact that 3 will fit into a single digit
and running time is linear in the length of scale
. This is effectively the same calculation as a/3
but since the underlying GMP library is faster than Python, a/3
is computed faster than scale//3
.
Here is a short example of the difference in performance between Python and GMP.
from gmpy2 import mpz
scale = 10**100000
%timeit scale//3
10000 loops, best of 3: 162 µs per loop
scale = mpz(scale)
%timeit scale//3
100000 loops, best of 3: 19 µs per loop
You were measuring the performance between an n
by n
division and an n
by k
division when you compared a/b
and a/3
. (n
is the length of a
in bits, and k
is much, much smaller than n
.) When you compared scale//3
and `a/3', you were comparing a simple, straight-forward division implementation with a highly-optimized implementation.
Implementation note: In the current unstable development branch, a/3
calls mpfr_div_ui
directly. This eliminates the creation of a temporary object by MPFR. This improves the performance as shown below.
from gmpy2 import mpfr, get_context
get_context().precision=1000
a=mpfr(1);b=mpfr(3)
%timeit a/b
1000000 loops, best of 3: 593 ns per loop
%timeit a/3
1000000 loops, best of 3: 231 ns per loop
get_context().precision=10000
a=mpfr(1); b=mpfr(3)
%timeit a/b
100000 loops, best of 3: 12.7 µs per loop
%timeit a/3
1000000 loops, best of 3: 927 ns per loop
get_context().precision=100000
a=mpfr(1);b=mpfr(3)
%timeit a/b
1000 loops, best of 3: 505 µs per loop
%timeit a/3
100000 loops, best of 3: 6.77 µs per loop