Why is the "divide and conquer" method of computing factorials so fast for large ints?

Question

I recently decided to look at factorial algorithms for large integers, and this "divide and conquer" algorithm is faster than a simple iterative approach and the prime-factoring approach:

def multiply_range(n, m):
    print n, m
    if n == m:
        return n
    if m < n:
        return 1
    else:
        return multiply_range(n, (n+m)/2) * multiply_range((n+m)/2+1, m)

def factorial(n):
    return multiply_range(1, n)

I understand why the algorithm works, it just breaks the multiplication into smaller parts recursively. What I don't understand is why this method is faster.

I don't think it's any faster than a factorial with a traditional for loop. Have you measured it? — kennytm, Dec 01 '12 at 07:11
It is faster for very big numbers. If you do it iteratively, you are multiplying ever larger numbers, which gets slower and slower. If you can reduce the sizes of the numbers by, say, cutting them in halves (recursively), calculations get asymptotically faster. But the larger overhead means this only works for sizes above a certain threshold. — Rudy Velthuis, Jul 26 '16 at 14:17

kalhartt · Accepted Answer · 2012-12-01T16:52:55.753

Contrary to @NPE's answer, your method is faster, only for very large numbers. For me, I began to see the divide and conquer method become faster for inputs ~10^4. At 10^6 and above there is no comparison a traditional loop fails miserably.

I'm no expert on hardware multipliers and I hope someone can expand on this, but my understanding is that multiplication is done digit for digit same way we are taught in grade school.

A traditional factorial loop will start with small numbers and the result keeps growing. In the end you are muliplying a ginormous number with a comparatively small number, an expensive calculation due to the mismatch in digits.

ex. compare

reduce(operator.mul, range(1,10**5))
reduce(operator.mul, range(10**5,1,-1))

the second is slower because the result grows fast, leading to more expensive calculations sooner.

Your method is faster than either of these by orders of magnitude for large numbers because it divides the factorial into similarly sized parts. The sub-results have similar numbers of digits and multiply faster.

This observation is true. However, it's worth pointing out that it is predicated on some highly unrealistic assumptions, namely that anyone would want to to compute factorial: (1) iteratively (rather than analytically); (2) exactly; and (3) using integer maths, and do this on inputs of the order of 10**5. — NPE, Dec 01 '12 at 17:12
I think the OP makes these assumptions, and secondly it's usually around 10^(10^x) that you switch to stirling's approximation — kalhartt, Dec 01 '12 at 17:48

NPE · Answer 2 · 2012-12-01T13:25:27.860

3

The short answer is that you're mistaken. It is not very fast:

In [34]: %timeit factorial(100)
10000 loops, best of 3: 57.6 us per loop

In [35]: %timeit reduce(operator.mul, range(1, 101))
100000 loops, best of 3: 19.9 us per loop

In other words, it is about three times slower than a straightforward one-liner.

For smaller values of n the difference is even more dramatic.

edited Dec 01 '12 at 13:25

answered Dec 01 '12 at 07:11

NPE

486,780
108
951
1,012

6

try bigger inputs, divide and conquer is meant for large factorials – kalhartt Dec 01 '12 at 16:04

Why is the "divide and conquer" method of computing factorials so fast for large ints?

2 Answers2

Linked