Theoretical vs actual time-complexity for algorithm calculating 2^n

Question

I am trying to compute the time-complexity and compare it with the actual computation times.

If I am not mistaken, the time-complexity is O(log(n)), but looking at the actual computation times it looks more like O(n) or even O(nlog(n)).

What could be reason for this difference?

def pow(n):
    """Return 2**n, where n is a nonnegative integer."""
    if n == 0:
        return 1
    x = pow(n//2)
    if n%2 == 0:
        return x*x
    return 2*x*x

Theoretical time-complexity:

Actual run times:

I started a bounty so hope this question draws enough attention for a better answer. — knh190, May 18 '19 at 13:24
When performing analysis of this type, you have to be careful about your definitions. The size of the input to `pow` is *not* the magnitude of *n*, it's the number of bits needed to represent *n*. The number of necessary bits increases logarithmically with the magnitude of *n*. (Or put another way, the number of bits decreases linearly as the magnitude decreases exponentially. Dividing a number by 2 only decreases the number of bits needed to represent it by 1.) — chepner, May 18 '19 at 13:37
`pow()` is a python builtin, you probably don't want to overwrite it. Also, just like how `pow(10,2000000,200000000000000)` doesn't take very long, python has some shortcuts that are used when using `pow()` — Eric Jin, May 25 '19 at 12:22

knh190 · Answer 1 · 2019-04-04T13:48:57.573

9

I was suspecting your time calculation is not accurate, so I did it using timeit, here're my stats:

import timeit
# N
sx = [10, 100, 1000, 10e4, 10e5, 5e5, 10e6, 2e6, 5e6]
# average runtime in seconds
sy = [timeit.timeit('pow(%d)' % i, number=100, globals=globals()) for i in sx]

Update:

Well, the code did run with O(n*log(n))...! A possible explanation is that multiplication / division is not O(1) for large numbers so this part doesn't hold:

T(n) = 1 + T(n//2)
     = 1 + 1 + T(n//4)
#      ^   ^
#     mul>1
#         div>1
# when n is large

Experiment with multiplication and division:

mul = lambda x: x*x
div = lambda y: x//2

s1 = [timeit.timeit('mul(%d)' % i, number=1000, globals=globals()) for i in sx]
s2 = [timeit.timeit('div(%d)' % i, number=1000, globals=globals()) for i in sx]

And plots, same for mul and div - they are not O(1) (?) small integers seem to be more efficient but no big difference for large integers. I don't know what could be the cause then. (though, I should keep the answer here if it can help)

edited Apr 04 '19 at 13:48

answered Apr 04 '19 at 07:47

knh190

2,744
1
16
30

Interestingly it is less than base 2 of log. – kabanus Apr 04 '19 at 07:54
@kabanus my last two data were wrong, I'm fixing it. – knh190 Apr 04 '19 at 07:56
I did not bother checking, but I do reproduce your results - and get a faster base than 2, now not quite as fast as what you have in your original version. – kabanus Apr 04 '19 at 07:57
@kabanus my previous stats were actually wrong. I think the explanation mentioned in another answer is quite on the right direction. – knh190 Apr 04 '19 at 08:29
But, this still doesn't explain why it's O(nlog(n)) because multiplication complexity doesn't lead to that. – knh190 Apr 04 '19 at 09:23
@knh190 ok, please tell me you are referring to the master theorem and not the 2 cent answer about the multiplication algorithm. – Evan Gertis Apr 04 '19 at 09:37
@EvanGertis Neither answer explains all to me, including my own tests. Can we wait for a better answer? It's a really interesting question and I want to start a bounty. – knh190 Apr 04 '19 at 09:43
Your last plot sure looks constant (enough) to me. Why are you saying they are not O(1)? – kabanus Apr 04 '19 at 11:55
@kabanus Idk what could be the cause then. Small integers seem to be more efficient, but not significantly? Maybe the performance is influenced by recursive call - different from a for-loop. Haven't tried though (not quite possible). – knh190 Apr 04 '19 at 13:51

Alain T. · Accepted Answer · 2019-05-22T17:40:57.267

The number of iterations will be log(n,2) but each iteration needs to perform a multiplication between two numbers that are twice as large as the preceding iteration's.

The best multiplication algorithms for variable precision numbers perform in O(N * log(N) * log(log(N))) or O(N^log(3)) where N is the number of digits (bits or words) needed to represent the number. It would seem that the two complexities combine to produce execution times that are larger than O(log(n)) in practice.

The digit count of the two numbers at each iteration is 2^i. So the total time will be the sum of multiplication (x*x) complexities for numbers going through the log(n) iterations

To compute the function's time complexity based on the Schönhage–Strassen multiplications algorithm, we would need to add the time complexity of each iteration using : O(N * log(N) * log(log(N))):

∑ 2^i * log(2^i) * log(log(2^i)) [i = 0...log(n)]

∑ 2^i * i * log(i) [i = 0...log(n)]

which would be quite complex, so let's look at a simpler scenario.

If Python's variable precision multiplications used the most naive O(N^2) algorithm, the worst case time could be expressed as:

∑ (2^i)^2 [i = 0...log(n)]

∑ 4^i [i = 0...log(n)]    

(4^(log(n)+1)-1)/3    # because ∑K^i [i=0..n] = (K^(n+1)-1)/(K-1)

( 4*4^log(n) - 1 ) / 3

( 4*(2^log(n))^2 - 1 ) / 3    

(4*n^2-1)/3   # 2^log(n) = n

(4/3)*n^2-1/3

This would be O(n^2), which suggests that the log(n) iteration time cancels itself out in favour of the multiplication's complexity profile.

We get the same result if we apply this reasoning to the Karatsuba multiplication algorithm: O(N^log(3)):

∑ (2^i)^log(3)    [i=0..log(n)]

∑ (2^log(3))^i    [i=0..log(n)]

∑ 3^i             [i=0..log(n)]

( 3^(log(n)+1) - 1 ) / 2   # because ∑K^i [i=0..n] = (K^(n+1)-1)/(K-1)

( 3*3^log(n) - 1 ) / 2

( 3*(2^log(3))^log(n) - 1 ) / 2

( 3*(2^log(n))^log(3) - 1 ) / 2

(3/2)*n^log(3) - 1/2

which corresponds to O(n^log(3)) and corroborates the theory.

Note that the last column of your measurement table is misleading because you're making n progress exponentially. This changes the meaning of t[i]/t[i-1] and its interpretation for the evaluation of time complexity. It would be more meaningful if the progression between N[i] and N[i-1] was linear.
Taking into account the N[i]/N[i-1] ratio in the calculation, I found that the results seem to correlate more with O(n^log(3)) which would suggest that Python uses Karatsuba for large integer multiplications. (for version 3.7.1 on MacOS) However, this correlation is very weak.

FINAL ANSWER: O(log(N))

After doing more tests, I realized that there are wild variations in the time taken to multiply large numbers. Sometimes larger numbers take considerably less time than smaller ones. This makes the timing figures suspect and correlation to a time complexity based on a small and irregular sample is not going to be conclusive.
With a larger and more evenly distributed sample, the time strongly correlates (0.99) with log(N). This would mean the differences introduced by multiplication overhead only impact fixed points in the value range. Intentionally selecting values of N that are orders of magnitude apart exacerbated the impact of these fixed points thus skewing the results.

So you can ignore all the nice theories I wrote above, because the data shows that the time complexity is indeed Log(n). You just have to use a more meaningful sample (and better rate of change calculations).

score 4 · Answer 3 · answered Apr 04 '19 at 07:25

4

Its because multiply 2 small numbers its O(1). But multiply 2 long number (N - num)O(log(N)**2). https://en.wikipedia.org/wiki/Multiplication_algorithm So in each step time increase not O(log(N))

answered Apr 04 '19 at 07:25

Евгений Попович

86
3

score 1 · Answer 4 · answered Apr 04 '19 at 07:22

1

This can be complex, but there are different cases that you will have to examine for different values of n since this is recursive. This should explain it https://en.wikipedia.org/wiki/Master_theorem_(analysis_of_algorithms).

answered Apr 04 '19 at 07:22

Evan Gertis

1,796
2
25
59

Okay so the user above me posts the same thing and that's *OK. *Also, the top ranked answer doesn't even answer the question smh. – Evan Gertis Apr 04 '19 at 09:32

score 1 · Answer 5 · answered May 18 '19 at 18:00

1

You have to consider the true input size of the function. It's not the magnitude of n, but the number of bits needed to represent n, which is logarithmic in the magnitude. That is, dividing a number by 2 doesn't cut the input size in half: it only reduces it by 1 bit. This means that for an n-bit number (whose value is between 2^n and 2^(n+1)), the running time is indeed logarithmic in magnitude, but linear in the number of bits.

    n       lg n                 bits to represent n
    --------------------------------------
    10     between 2 and 3      4 (1010)
   100     between 4 and 5      7 (1100100)
  1000     just under  7       10 (1111101000)
 10000     between 9 and 10    14 (10011100010000)

Each time you multiply n by 10, you are only increasing the input size by 3-4 bits, roughly a factor of 2, not a factor of 10.

answered May 18 '19 at 18:00

chepner

497,756
71
530
681

But are you sure it's the explanation behind CPython? Can you reference to some source code? – knh190 May 18 '19 at 20:52
This is a general concept, independent of your implementation. – chepner May 19 '19 at 02:17
The **n** parameter IS the number of bits given that we are calculating 2^n. So each recursion does cut the number of BITs in half (i.e. halves the size of x) which will result in log(n) iterations. The top level multiplication (first iteration) will be between two numbers with n//2 bits. e.g. n=1000 makes a 500 bits variable x, multiplied by itself, 2^500 * 2^500 --> 2^1000. It does not matter how many bits are needed to represent **n** because it is x that is being multiplied. – Alain T. May 22 '19 at 20:27
Yes, but the *input size* is the number of bits needed to represent `n`, regardless of what `n` itself represents. – chepner May 22 '19 at 21:02

imposeren · Answer 6 · 2019-05-24T23:00:49.330

For some integer values python will internally use "long reperesentation" And in your case this happens somewhere after n=63, so your theoretical time complexity should be correct only for values of n < 63.

For "long representation" multiplying 2 numbers (x * y) have complexity bigger than O(1):

for x == y (e.g. x*x) complexity is around O(Py_SIZE(x)² / 2).
for x != y (e.g. 2*x) multiplication is performed like "Schoolbook long multiplication", so complexity will be O(Py_SIZE(x)*Py_SIZE(y)). In your case it might a little affect performance too, because 2*x*x will do (2*x)*x, while faster way would be to do 2*(x*x)

And so for n>=63 theoretical complexity must also account for complexity of multiplications.

It's possible to measure "pure" complexity of custom pow (ignoring complexity of multiplication) if you can reduce complexity of multiplication to O(1). For example:

SQUARE_CACHE = {}
HALFS_CACHE = {}


def square_and_double(x, do_double=False):
    key = hash((x, do_double))
    if key not in SQUARE_CACHE:
        if do_double:
            SQUARE_CACHE[key] = 2 * square_and_double(x, False)
        else:
            SQUARE_CACHE[key] = x*x
    return SQUARE_CACHE[key]


def half_and_remainder(x):
    key = hash(x)
    if key not in HALFS_CACHE:
        HALFS_CACHE[key] = divmod(x, 2)
    return HALFS_CACHE[key]


def pow(n):
    """Return 2**n, where n is a non-negative integer."""
    if n == 0:
        return 1
    x = pow(n//2)
    return square_and_double(x, do_double=bool(n % 2 != 0))


def pow_alt(n):
    """Return 2**n, where n is a non-negative integer."""
    if n == 0:
        return 1
    half_n, remainder = half_and_remainder(n)
    x = pow_alt(half_n)
    return square_and_double(x, do_double=bool(remainder != 0))

import timeit
import math


# Values of n:
sx = sorted([int(x) for x in [100, 1000, 10e4, 10e5, 5e5, 10e6, 2e6, 5e6, 10e7, 10e8, 10e9]])

# Fill caches of `square_and_double` and `half_and_remainder` to ensure that complexity of both `x*x` and of `divmod(x, 2)` are O(1):
[pow_alt(n) for n in sx]

# Average runtime in ms:
sy = [timeit.timeit('pow_alt(%d)' % n, number=500, globals=globals())*1000 for n in sx]

# Theoretical values:
base = 2
sy_theory = [sy[0]]
t0 = sy[0] / (math.log(sx[0], base))
sy_theory.extend([
    t0*math.log(x, base)
    for x in sx[1:]
])
print("real timings:")
print(sy)
print("\ntheory timings:")
print(sy_theory)

print('\n\nt/t_prev:')
print("real:")
print(['--' if i == 0 else "%.2f" % (sy[i]/sy[i-1]) for i in range(len(sy))])
print("\ntheory:")
print(['--' if i == 0 else "%.2f" % (sy_theory[i]/sy_theory[i-1]) for i in range(len(sy_theory))])

# OUTPUT:
real timings:
[1.7171500003314577, 2.515988002414815, 4.5264500004122965, 4.929114998958539, 5.251838003459852, 5.606903003354091, 6.680275000690017, 6.948587004444562, 7.609975000377744, 8.97067000187235, 16.48820400441764]

theory timings:
[1.7171500003314577, 2.5757250004971866, 4.292875000828644, 4.892993172417281, 5.151450000994373, 5.409906829571465, 5.751568172583011, 6.010025001160103, 6.868600001325832, 7.727175001491561, 8.585750001657289]


t/t_prev:
real:
['--', '1.47', '1.80', '1.09', '1.07', '1.07', '1.19', '1.04', '1.10', '1.18', '1.84']

theory:
['--', '1.50', '1.67', '1.14', '1.05', '1.05', '1.06', '1.04', '1.14', '1.12', '1.11']

Results are still not perfect but close to theoretical O(log(n))

score 0 · Answer 7 · answered May 25 '19 at 14:08

You can generate textbook-like results if you count what they count, the steps taken:

def pow(n):
    global calls
    calls+=1
    """Return 2**n, where n is a nonnegative integer."""
    if n == 0:
        return 1
    x = pow(n//2)
    if n%2 == 0:
        return x*x
    return 2*x*x

def steppow(n):
    global calls
    calls=0
    pow(n)
    return calls

sx = [math.pow(10,n) for n in range(1,11)]
sy = [steppow(n)/math.log(n) for n in sx]
print(sy)

Then it produces something like this:

[2.1714724095162588, 1.737177927613007, 1.5924131003119235, 1.6286043071371943, 1.5634601348517065, 1.5200306866613815, 1.5510517210830421, 1.5200306866613813, 1.4959032154445342, 1.5200306866613813]

Where the 1.52... appears to be some kind of favourite.

But the actual runtime also includes the seemingly innocent mathematical operations, which grow in complexity too as the number physically grows in the memory. CPython uses a numer of multiplication implementations branching at various points:

long_mul is the entry:

if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
    stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
    return PyLong_FromLongLong((long long)v);
}

z = k_mul(a, b);

if the numbers fit into a CPU word, they get multiplied in place (but the result may be larger, hence the LongLong (*)), otherwise they will use k_mul() which stands for Karatsuba multiplication, which also checks a couple things based on size and value:

i = a == b ? KARATSUBA_SQUARE_CUTOFF : KARATSUBA_CUTOFF;
if (asize <= i) {
    if (asize == 0)
        return (PyLongObject *)PyLong_FromLong(0);
    else
        return x_mul(a, b);
}

for shorter numbers, a classic algorithm is used, x_mul(), and shortness-check also depends on the product being a square, beause x_mul() has an optimized code path for calculating x*x-like expressions. However, above a certain in-memory-size, the algorithm stays locally, but then it does another check about how different the magnitude of the two values are different:

if (2 * asize <= bsize)
    return k_lopsided_mul(a, b);

possibly branching to yet another algorithm, k_lopsided_mul(), which is still Karatsuba, but optimized for multiplying numbers with significant difference in magnitude.

In short, even the 2*x*x has significance, if you replace it with x*x*2, timeit results will differ:

2*x*x: [0.00020009249478223623, 0.0002965123323532072, 0.00034258906889154733, 0.0024181753953639975, 0.03395215528201522, 0.4794894526936972, 4.802882867816082]
x*x*2: [0.00014974939375012042, 0.00020265231347948998, 0.00034002925019471775, 0.0024501731290706985, 0.03400164511014836, 0.462764023966729, 4.841786565730171]

(measured as

sx = [math.pow(10,n) for n in range(1,8)]
sy = [timeit.timeit('pow(%d)' % i, number=100, globals=globals()) for i in sx]

)

(*) by the way, as the size of the result is often overestimated (like at the very beginning, long*long may or may not fit into a long afterwards), there is a long_normalize function too, which at the end does spend time on freeing extra memory (see comment above it), but still sets the correct size for the internal object, which involves a loop counting zeroes in front of the actual number.

Theoretical vs actual time-complexity for algorithm calculating 2^n

7 Answers7