3

I have two blocks of code, both of which I have written to apply the sieve of eratosthenes to sum all primes up to 2000000. This first block, which is just raw code not wrapped in any function, is this:

N = 2000000
is_prime = (N + 1) * [True]

for candidate in range(2, N + 1):
    if is_prime[candidate]:
        print(candidate)
        for witness in range(2 * candidate, N + 1, candidate):
            is_prime[witness] = False

The second block of code has split this functionality into a function which check for primality, and then a for loop which specifies the upper bound. It is as follows:

  def is_prime(n):
  is_prime = (n + 1) * [True]

  for candidate in range(2, int(sqrt(n)) + 1):
      if is_prime[candidate]:
          for witness in range(2 * candidate, n+1, candidate):
              is_prime[witness] = False

  return is_prime[n]

for candidate in range(2, LIMIT):
    if is_prime(candidate):
        print(candidate)

However, the block of code split into the function which checks primality is infinitely slower. I cannot for the life of me figure out what the difference between these blocks of code is. What am I doing wrong?

Peter O.
  • 32,158
  • 14
  • 82
  • 96
EthanS
  • 115
  • 7

1 Answers1

4

Your second implementation keeps the list is_prime as a local. At every function invocation it "restarts" the computation by initialising the list to (n + 1) * [True].

So by restarting the work you basically do N times the work when using your second implementation.

Edit: as @Aaron correctly pointed out in the comments also your call to print() makes the second version slower.

Problems

Summarizing there are the following problems:

  • The implementation using a function restarts its work
  • The second implementation does a print. The first one does not which is obviously faster.
  • Just as a side note: your is_prime list has the same name as your function. This will cause trouble for example when using recursion.

Improvements

As a very simple improvement you could try to (rename and) move the is_prime list into a global variable. Then when is_prime(n) is called with a number that is not yet in the list, you extend the list (e.g. some_list += difference * [True]) and only compute the difference.

Flurin
  • 681
  • 5
  • 14
  • you could build on this by mentioning memoization – Aaron Apr 05 '17 at 13:16
  • 2
    edit: sorry, misunderstood your comment about printing. yes true, printing has a impact too. But I'd say this is negligible given the amount of extra work that is done by restarting. – Flurin Apr 05 '17 at 13:17
  • @Aaron thanks. Extended my answer using your comments. – Flurin Apr 05 '17 at 13:29
  • Flurin, thank you so much for the prompt and in depth response! As I said, this is my first post on stackoverflow, and getting such a helpful response so quickly is very encouraging. However, I moved is+prime outside of the loop as you suggested, but it appears not to have had an effect on the speed. I have also found that it appears the non-function block of code will stop trying to access the is_prime list once the largest prime is found, while the second block of code continues to access the list for every witness it is provided. I figured this out by printing is_prime for each witness. – EthanS Apr 05 '17 at 15:30
  • Ethan, you're welcome! Unfortunately it's a bit more involved than just moving the list. let's assume you call is_prime(4). Your list would then contain [..(omitting 0 and 1)... True, True, False, True] for the numbers 2, 3, 4 and 5. If you call is_prime(10) you'd need to do the following steps: First, grow the list for the numbers 6 to 10 (the size is now 11). Second, for the numbers 6 to 10 try to divide them by every prime you found (those with True in the list). – Flurin Apr 05 '17 at 16:03
  • You could also try googling for "memoization" and sieve of Eratosthenes if you find something useful. Memoization, reusing previously computed values, is what you're trying to do here. – Flurin Apr 05 '17 at 16:07
  • I ended up going a slightly different route with the function, which worked, and while it wasn't specifically the answer you provided I would't have gotten there without your answer thank you @Flurin and everyone else! will be looking into memoization a bit more as well. Cheers. – EthanS Apr 05 '17 at 20:10