Is MPFR division faster than native integer division?

Question

I've always made the assumption that integer division was faster than floating point division, but I did some tests that seemed to prove otherwise.

import gmpy2, time, math

digits = 100000

scale = 10**digits  # Decimal precision
gmpy2.get_context().precision = int(math.log2(10) * digits)  # Binary precision

def start_timer():
    global start_time  
    start_time = time.time()

def print_timer():
    print("%s s" % (time.time() - start_time))

start_timer()
for i in range(1000):
    x = scale // 3
print_timer()

start_timer()
for i in range(1000):
    x = gmpy2.mpfr(1) / 3
print_timer()

start_timer()
for i in range(1000):
    x = gmpy2.mpfr(1) / gmpy2.mpfr(3)
print_timer()

The integer division took 0.17 seconds, the mpfr division took 0.06 seconds, and the division between two floating point numbers took 15.56 seconds.

My questions:

Am I setting up this test correctly?
Is mpfr division really more optimized than native division?
Is division involving a floating point and an integer much faster than that involving two floating point numbers?

Notice that you're including the time to create the `mpfr` objects as well as the time to do the division in your tests. Also, I'd suggest using the `timeit` module for this sort of thing. — Mark Dickinson, Jul 29 '14 at 20:47
And are you *really* surprised that dividing a 330000-bit float by a 2-bit integer is much faster than dividing a 330000-bit float by another 330000-bit float? :-) — Mark Dickinson, Jul 29 '14 at 20:50
@MarkDickinson The docs say that during division both numbers are converted to the correct type, so I assumed this meant both numbers had to be integers or floating point — qwr, Jul 29 '14 at 20:52
I'd guess that in the second division, `gmpy2` is converting the RHS `3` into an `mpfr` instance with small precision (since that can be done exactly with no loss of accuracy), while in the third division you're creating two full-blown `mpfr` instances at the current precision. — Mark Dickinson, Jul 29 '14 at 20:56
And Python's own long-integer division algorithm is simple and portable but not particularly optimised. It's not really aimed at 100000-digit calculations. That's what gmpy2 is there for. So it's not really surprising that 'native' division is a bit slower than MPFR's division. — Mark Dickinson, Jul 29 '14 at 20:58
@MarkDickinson Can you combine all your comments into an answer that other people can refer to? I believe it's better than the current answer concerning the internals of gmpy2 — qwr, Jul 29 '14 at 21:08
From the source, it looks as though the mpfr / int case uses MFPR's `mpfr_div_z`, after converting the Python int to an `mpz` instance. — Mark Dickinson, Jul 29 '14 at 21:08
I'll wait a bit and see if @casevh passes by; he's best placed to give a good answer to this question. — Mark Dickinson, Jul 29 '14 at 21:09
@MarkDickinson I just passed by... I won't be able to give a detailed answer for a few hours. — casevh, Jul 29 '14 at 22:15
@MarkDickinson That's good, casevh has answered some of my other questions as well — qwr, Jul 30 '14 at 00:42

casevh · Accepted Answer · 2014-07-30T06:37:03.400

I'm using IPython to time some short examples and then I'll try to explain the results.

from gmpy2 import mpfr, get_context
get_context().precision=1000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000000 loops, best of 3: 669 ns per loop
%timeit a/3
1000000 loops, best of 3: 464 ns per loop

get_context().precision=10000
a=mpfr(1);b=mpfr(3)

%timeit a/b
100000 loops, best of 3: 12.9 µs per loop
%timeit a/3
1000000 loops, best of 3: 1.33 µs per loop

get_context().precision=100000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000 loops, best of 3: 505 µs per loop
%timeit a/3
100000 loops, best of 3: 8.13 µs per loop

Notice that as the precision increases, the running time for a/b increases more rapidly than a/3. When calculating a/b, MPFR uses the full precision of both values and the running time is (roughly) O(n * ln(n)). When calculating a/3, MPFR uses a short, but exact, representation of 3 and the running time is (roughly) O(n). This explains why a/b is slower than a/3 for high precision. (n is the length of a in bits.)

When Python calculates scale//3, it takes advantage of the fact that 3 will fit into a single digit and running time is linear in the length of scale. This is effectively the same calculation as a/3 but since the underlying GMP library is faster than Python, a/3 is computed faster than scale//3.

Here is a short example of the difference in performance between Python and GMP.

from gmpy2 import mpz
scale = 10**100000

%timeit scale//3
10000 loops, best of 3: 162 µs per loop

scale = mpz(scale)

%timeit scale//3
100000 loops, best of 3: 19 µs per loop

You were measuring the performance between an n by n division and an n by k division when you compared a/b and a/3. (n is the length of a in bits, and k is much, much smaller than n.) When you compared scale//3 and `a/3', you were comparing a simple, straight-forward division implementation with a highly-optimized implementation.

Implementation note: In the current unstable development branch, a/3 calls mpfr_div_ui directly. This eliminates the creation of a temporary object by MPFR. This improves the performance as shown below.

from gmpy2 import mpfr, get_context
get_context().precision=1000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000000 loops, best of 3: 593 ns per loop
%timeit a/3
1000000 loops, best of 3: 231 ns per loop

get_context().precision=10000
a=mpfr(1); b=mpfr(3)

%timeit a/b
100000 loops, best of 3: 12.7 µs per loop
%timeit a/3
1000000 loops, best of 3: 927 ns per loop

get_context().precision=100000
a=mpfr(1);b=mpfr(3)

%timeit a/b
1000 loops, best of 3: 505 µs per loop
%timeit a/3
100000 loops, best of 3: 6.77 µs per loop

score 2 · Answer 2 · answered Jul 30 '14 at 10:43

A note about the GNU MPFR implementation (I'm a MPFR developer, though I haven't really worked on the division): the selection of the best algorithm for multiplication and division is quite difficult, because there are various parameters (precisions of the inputs and output, and whether the inputs can be represented with a smaller precision because of trailing zeros, in particular) and some cases may be more difficult to round than others. Moreover algorithms thus timings may change from one release to another, improving some cases but making at the same time other cases slower. Even recently (two months ago), we had a discussion about whether doing a special recognition of constant powers of two for the integer in mpfr_mul_ui and mpfr_div_ui.

If you want to really compare integer division with MPFR FP division, you should do the comparison with GMP's integer division. MPFR is based on GMP's division, but not naively. The best way to know what MPFR is doing is to use MPFR logging (this may need a rebuild with --enable-logging) with the corresponding environment variables. Note that when logging is enabled in the MPFR build, even if logging is not used, MPFR may be a bit slower.

score 1 · Answer 3 · answered Jul 29 '14 at 20:37

Floating-point division is typically faster than integer division on the CPU. One could speculate that this has to do with the FPU being more optimized for this operation, or that the floating-point representation makes division easier. But whatever the reasons won't change the fact. In the end, the only way to obtain concrete answers to your second and third questions is to test it. And yes, your tests looks OK to me.

If I had to venture a guess, I think that the case where you divide an MPFR number by an integer is faster because GMP can use its limited precision to its advantage when computing the division.

The FPU probably doesn't even come into it. The MPFR floats are represented internally using integer limbs, and it's likely that all the computation is using integer arithmetic. — Mark Dickinson, Jul 29 '14 at 20:42

Is MPFR division faster than native integer division?

My questions:

3 Answers3