How to improve divide-and-conquer runtimes?

Question

When a divide-and-conquer recursive function doesn't yield runtimes low enough, which other improvements could be done?

Let's say, for example, this power function taken from here:

def power(x, y):
    if (y == 0): return 1
    elif (int(y % 2) == 0):
        return (power(x, int(y / 2)) * power(x, int(y / 2)))
    else:
        return (x * power(x, int(y / 2)) * power(x, int(y / 2)))

Since it's recursive nature, memoizing and tail recursion (not sure if it can be applied alongside divide-and-conquer) could help substantially but I don't know of any other tweaks. For my benchmarks, I'm using:

base = 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139
exponent = 1000000

Taking 301.140625s to finish and I still need for it to be able to handle bigger bases and exponents... maybe dividing the problem into more than 2 subproblems?

The memoizer being used can be found here

Do you **have** to use recursion or using an iterative approach is also fine ? — qcoumes, Jun 15 '21 at 14:18
@QuentinCoumes both are fine altough I'm interested on improving divide-and-conquer. I mean, improving recursive functions. — loko, Jun 15 '21 at 14:19
Easy improvements would be replacing multiple calls to `power(x, int(y / 2))` to a unique one using a variable (even if *memoization* helps regarding this), you could also consider using bit shifting instead of division and modulus. But don't forget that pure python is not really made for compute-intensive tasks. — qcoumes, Jun 15 '21 at 14:31

Mark Saving · Accepted Answer · 2021-06-15T17:36:47.993

The primary optimization you should use here is common subexpression elimination. Consider your first piece of code:

def power(x, y):
    if y == 0: 
        return 1
    elif y % 2 == 0:
        return (power(x, y // 2) * power(x, y // 2)
    else:
        return (x * power(x, y // 2) * power(x, y // 2)

I slightly cleaned it up - since y is an int, y % 2 will also be an int and there is thus no need to convert types. And the canonical way to write int(y / 2) is y // 2. Finally, in if and while statements, there is no need to put parentheses around the Boolean clause since we end the clause with a semi-colon (whereas in C-like syntax, it's possible to have a 1-line if statement and therefore we need parentheses).

The problem here is that you are computing power(x, y // 2) twice in both the elif and the else cases. You should instead try to compute this value only once.

def power(x, y):
    if (y == 0): 
        return 1
    else:
        smaller_power = power(x, y // 2)
        if y % 2 == 1:
            return x * smaller_power * smaller_power
        else:
            return smaller_power * smaller_power

This immediately and dramatically improves the efficiency of your algorithm. Instead of being O(y) in time, the improved version is O(log y) in time (assuming we count a single * as 1 operation).

With a bit of cleverness, we can come up with a slightly different algorithm:

def power(x, y):
    if y == 0:
        return 1
    else:
        smaller_power = power(x * x, y // 2)
        return x * smaller_power if y % 2 == 1 else smaller_power

which can be transformed into the following tail-recursive version:

def power_tail_helper(x, y, acc):
    """
    returns acc * power(x, y)
    """
    if y == 0:
        return acc
    else:
        new_acc = acc * x if y % 2 == 1 else acc
        return power_tail_helper(x * x, y // 2, new_acc)

def power_tail(x, y):
    return power_tail_helper(x, y, 1)

which in turn can be turned into the following iterative algorithm:

def power_iter(x, y):
    acc = 1
    while y != 0:
        (x, y, acc) = (x * x, y // 2, acc * x if y % 2 == 1 else acc)
    return acc

Note that in idiomatic Python, we would instead write this as

def power_iter(x, y):
    acc = 1
    while y:
        x, y, acc = (x * x, y // 2, acc * x if y % 2 else acc)
    return acc

using the fact that numbers are automatically converted to Booleans in the appropriate context, with 0 being False and all other numbers being True.

The last approach you could use is a more mathematical one. You could compute the exponent using logarithms. This would require a high-precision logarithm library to get an exact answer, since base is a 320-bit number, far too large for the ordinary 64-bit double precision floating point arithmetic version of log to be precise enough. This is probably the optimal approach, but you'd have to do more research.

Either way, keep in mind that the size of the output is going to be a number that takes over 100 million bytes to store. So no matter what algorithm you use, it's going to take a while, since even the "algorithm" of having the answer stored in memory and reading it out is going to take a while.

Your solution is pretty awesome, thank you. I know about `BigFloat`, which is a wrapper for MPFR, but I'm having troubles to use it on Windows. I will take your solution meanwhile. — loko, Jun 15 '21 at 19:00
@loko Something I forgot to mention is that when you're dealing with massive numbers (eg 100 million byte numbers), the bottleneck will probably be the multiplication algorithm. So if you're going to be using this in production for large integers, you'll probably want to look into fast multiplication algorithms based on FFT. — Mark Saving, Jun 15 '21 at 19:36

How to improve divide-and-conquer runtimes?

1 Answers1