Why does this make my prime-generating algorithm take longer?

Question

I'm trying to find all primes below 1 000 000. As all non-primes can be factorised into primes, my method is to start my list of primes as [2, 3] and then run through every number up to 1 000 000. If a number is divisible by any number in prime_list, then it's not prime, and I move onto the next number. If this number isn't divisible by any number in prime_list, then it must be prime, and it gets added to this list.

To try and make this more efficient, I added a statement to only check if a number in question is divisible by values below the square root of this number. I thought this would cut out a lot of computation time but it actually makes my program take longer. Can anyone explain why?

Here's my code:

import math

import time
start_time = time.time()

prime = [2, 3]

def prime_checker(a):
    for j in prime:
        if j < (int(math.sqrt(a)) +1 ):     /// without this line, the program runs faster
            if a % j == 0:
                return False

for i in range (2, 100000):
    if prime_checker(i) != False:
        prime.append(i)

print prime[-1]

print "Time taken = ", time.time() - start_time

You are repeatedly performing an expensive operation that is independent of the value of `j`. Compute `math.sqrt(a)` *once* outside the loop. — chepner, Dec 23 '16 at 15:36
See also: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes — Robᵩ, Dec 23 '16 at 15:50
You can probably do a little better: keep an index to the largest prime which, when squared, doesn't exceed the currently tested number. When you go from i to i+1, you can update this index by a single square-root-free test. Then you immediately know the range of divisors to be tried. — , Dec 23 '16 at 21:27

score 3 · Answer 1 · answered Dec 23 '16 at 18:27

To further speed up your algorithm, note that 2 is the only even prime. All other even numbers are composite. You already have prime = [2, 3] so you can start searching at 5 (4 is not prime) and only check odd numbers:

for i in range (5, 100000, 2):
    if prime_checker(i) != False:
        prime.append(i)

chepner · Answer 2 · 2016-12-24T15:43:33.610

2

The time you spend repeatedly calculating the square root of a overwhelms whatever you save by skipping larger primes. Compute the square root once, which in my test is 10x faster than calculating it repeatedly (and about 3x faster than leaving the line out altogether).

def prime_checker(a):
    limit = int(math.sqrt(a)) + 1
    for j in prime:
        if j > limit:
            break
        if a % j == 0:
            return False
    return True

for i in range (2, 100000):
    if prime_checker(i):
        prime.append(i)

edited Dec 24 '16 at 15:43

answered Dec 23 '16 at 15:41

chepner

497,756
71
530
681

You don't actually need to compute the square root at all; instead, square the prime you're testing against. – Nick Johnson Dec 23 '16 at 16:14
One square root is still going to be faster once you have to square more than 5 primes or so in the body of the loop. – chepner Dec 23 '16 at 16:25
I'm pretty sure a square root costs a lot more than five multiplies. – Nick Johnson Dec 24 '16 at 13:49
Possibly. However, the point is that for a prime number `a`, you would need `O(lg (sqrt(a)))` multiplications, which is probably *not* faster than a single square root of `a`. – chepner Dec 24 '16 at 14:02
for a *prime* `a` you'd need *O( sqrt(a) / lg(sqrt(a)))* multiplications with the optimal trial division algorithm, but *O(a / lg(a))* with OP's. i.e., the OP is *not* skipping the larger primes. Even your code isn't. :) – Will Ness Dec 24 '16 at 15:22
Ah, OK, that last point I can fix. I'm not sure figuring the break-even point between squaring primes and just calculating a single square-root is going to be worth the effort. – chepner Dec 24 '16 at 15:43

Will Ness · Answer 3 · 2016-12-24T15:36:53.223

1

"If this number isn't divisible by any number in prime_list" that is not greater than this number's square root, then it is a prime.

And once you've hit a prime above the square root, all the rest of them will be so as well. We know this in advance.

The point is not to check whether to avoid each extraneous check, but to prevent all of them, in advance. This will speed up your code 100x, if not 1000x or even more.

In other words the real problem isn't the repeated calculation of the sqrt, but the incorrect handling of the limiting condition. With the correct handling of the limiting condition, even the repeated calculation of sqrt shouldn't matter much.

and the correct way is: break out from the loop as soon as possible, i.e. immediately when the first prime above the square root is reached; possibly by directly returning True right then and there.

edited Dec 24 '16 at 15:36

answered Dec 24 '16 at 11:57

Will Ness

70,110
9
98
181

I think he is aware of this, which is why he added the comparison to the square root. The problem was he was doing too many square-root calculations, which was more expensive than just checking the too-large primes. – chepner Dec 24 '16 at 14:04
1

@chepner what he's doing is ~ *k^2*; what I propose is ~ *k^1.5*, in *k* primes produced. Perhaps I was a bit too cryptic. I've edited for clarity. – Will Ness Dec 24 '16 at 15:11
1

IOW with the correct handling of the sqrt condition, even the repeated calculation of sqrt shouldn't matter much. – Will Ness Dec 24 '16 at 15:30

score 0 · Answer 4 · answered Dec 28 '16 at 20:17

Using **0.5 instead of math.sqrt also tends to be exponentially faster:

>>> import timeit
>>> print(timeit.timeit('math.sqrt(1830374)', setup='import math', number=10000))
0.0020401941434329274
>>> print(timeit.timeit('1830374**0.5', number=10000))
0.00015091430498159752

Chepner's code is the right answer though, just don't forget to iterate like rossum said. Iteration the way he stated will literally save you half a million calls (though, they will break quickly with Chepner's algorithm, that's still a lot of wasted time).

Andreikkaa · Answer 5 · 2016-12-28T22:45:35.737

0

Your solution consumes O(N**1.5) time. To faster it, use Sieve of Eratosthenes. It's time complexity is O(NloglogN)).

n = 10 ** 5
sieve = [True] * n
sieve[0] = sieve[1] = False

for i in range(2, n):
    if sieve[i]:
        for j in range(i + i, n, i):
            sieve[j] = False

primes = []
for i, j in enumerate(sieve):
    if j:
        primes.append(i)

edited Dec 28 '16 at 22:45

answered Dec 28 '16 at 22:40

Andreikkaa

297
1
7

Why does this make my prime-generating algorithm take longer?

5 Answers5