Strange Python Speedup for Modular Inverse

Question

Usually I am lazy and calculate modular inverses as follows:

def inv(a,m):
    return 1 if a==1 else ((a-inv(m%a,a))*m+1)//a
print(inv(100**7000,99**7001))

But I was curious to know whether the method of passing more information back, namely the solution to Bezout's theorem (instead of just one of the pair), yields a faster or slower algorithm in practice:

def bez(a,b):
    # returns [x,y,d] where a*x+b*y = d = gcd(a,b) since (b%a)*y+a*(x+(b//a)*y)=d
    if a==0: return [0,1,b] if b>0 else [0,-1,-b]
    r=bez(b%a,a)
    return [r[1]-(b//a)*r[0],r[0],r[2]]
def inv(a,m):
    r=bez(a,m)
    if r[2]!=1: return None
    return r[0] if r[0]>=0 else r[0]+abs(m)
print(inv(100**7000,99**7001))

I was amazed to find that the latter ran more than 50 times faster than the former! But both use nearly the same number of recursive calls and 1 integer division and 1 modulo operation per call, and the bit-length of the operands is roughly twice in the former on average (because the arguments involved are identical), so I only expect its time complexity to be roughly 4 times that of the latter, not 50 times.

So why am I observing this strange speedup?

Note that I am using Python 3, and I observed this when running online, with the following code added at the top to stop the Python interpreter complaining about exceeding maximum recursion depth or stack space:

import resource,sys
sys.setrecursionlimit(1000000)
resource.setrlimit(resource.RLIMIT_STACK,[0x10000000,resource.RLIM_INFINITY])

score 1 · Accepted Answer · answered Aug 30 '21 at 19:39

I figured it out finally. Suppose the initial inputs are n-bit integers. In typical cases, besides the first call to inv or inv2, the recursive calls have parameters whose sizes differ by just O(1) on average, and there are O(n) recursive calls on average, both due to some number-theoretic phenomenon. This O(1) size difference implies that the two methods actually have drastically different average time complexities. The implicit assumption in my question was that multiplication or division of two n-bit integers takes roughly O(n^2) time; this holds in the worst-case (for school-book multiplication), but not in the average case for the fast method!

For the fast method, it can be proven that when p,q > 0, the triple [x,y,d] returned by bez(p,q) satisfies abs(x) ≤ q and abs(y) ≤ p. Thus when each call bez(a,b) performs b%a and (b//a)*r[0], the modulo and division and multiplication each takes O(size(a)) time on average, since size(b)-size(a) ∈ O(1) on average and abs(r[0]) ≤ a. Total time is therefore roughly O(n^2).

For the slow method, however, when each call inv(a,m) performs ((a-inv(m%a,a))*m+1)//a, we have a-inv(m%a,a) roughly the same size as a, and so this division is of two integers where the first is roughly twice the size of the second, which would take O(size(a)^2) time! Total time is therefore roughly O(n^3).

Indeed, simulation yields very close agreement with the above analysis. For reference, using the fast method for inv(100**n,99**(n+1)) where n = 1400·k for k∈[1..6] took time/ms 17,57,107,182,281,366, while using the slow method where n = 500·k for k∈[1..6] took time/ms 22,141,426,981,1827,3101. These are very well fit by 9·k^2+9·k and 13·k^3+8·k^2 respectively, with mean fractional error ≈3.3% and ≈1.6% respectively. (We include the first two highest-order terms because they contribute a small but significant amount.)

Strange Python Speedup for Modular Inverse

1 Answers1