4

I have written a simple implementation of the Sieve of Eratosthenes, and I would like to know if there is a more efficient way to perform one of the steps.

def eratosthenes(n):
    primes = [2]
    is_prime = [False] + ((n - 1)/2)*[True]
    for i in xrange(len(is_prime)):
        if is_prime[i]:
            p = 2*i + 1
            primes.append(p)
            is_prime[i*p + i::p] = [False]*len(is_prime[i*p + i::p])
    return primes

I am using Python's list slicing to update my list of booleans is_prime. Each element is_prime[i] corresponds to an odd number 2*i + 1.

is_prime[i*p + i::p] = [False]*len(is_prime[i*p + i::p])

When I find a prime p, I can mark all elements corresponding to multiples of that prime False, and since all multiples smaller than p**2 are also multiples of smaller primes, I can skip marking those. The index of p**2 is i*p + i.

I'm worried about the cost of computing [False]*len(is_prime[i*p + 1::p]) and I have tried to compare it to two other strategies that I couldn't get to work.

For some reason, the formula (len(is_prime) - (i*p + i))/p (if positive) is not always equal to len(is_prime[i*p + i::p]). Is it because I've calculated the length of the slice wrong, or is there something subtle about slicing that I haven't caught?

When I use the following lines in my function:

print len(is_prime[i*p + i::p]), ((len(is_prime) - (i*p + i))/p)
is_prime[i*p + i::p] = [False]*((len(is_prime) - (i*p + i))/p)

I get the following output (case n = 50):

>>> eratosthenes2(50)
7 7
3 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in eratosthenes2
ValueError: attempt to assign sequence of size 2 to extended slice of size 3

I also tried replacing the bulk updating line with the following:

for j in xrange(i*p + i, len(is_prime), p):
    is_prime[j] = False

But this fails for large values of n because xrange doesn't take anything bigger than a long. I gave up on trying to wrestle itertools.count into what I needed.

Are there faster and more elegant ways to bulk-update the list slice? Is there anything I can do to fix the other strategies that I tried, so that I can compare them to the working one? Thanks!

castle-bravo
  • 1,389
  • 15
  • 34

1 Answers1

2

Use itertools.repeat():

is_prime[i*p + 1::p] = itertools.repeat(False, len(is_prime[i*p + 1::p]))

The slicing syntax will iterate over whatever you put on the right-hand side; it doesn't need to be a full-blown sequence.

So let's fix that formula. I'll just borrow the Python 3 formula since we know that works:

1 + (hi - 1 - lo) / step

Since step > 0, hi = stop and lo = start, so we have:

1 + (len(is_prime) - 1 - (i*p + 1))//p

(// is integer division; this future-proofs our code for Python 3, but requires 2.7 to run).

Now, put it all together:

slice_len = 1 + (len(is_prime) - 1 - (i*p + 1))//p
is_prime[i*p + 1::p] = itertools.repeat(False, slice_len)

Python 3 users: Please do not use this formula directly. Instead, just write len(range(start, stop, step)). That gives the same result with similar performance (i.e. it's O(1)) and is much easier to read.

Kevin
  • 28,963
  • 9
  • 62
  • 81
  • Nice try. Using timeit: >>> timeit.timeit('eratosthenes0(10000000)', setup="from __main__ import eratosthenes0, count", number=1) 3.1473147869110107 >>> timeit.timeit('eratosthenes1(10000000)', setup="from __main__ import eratosthenes1; from itertools import repeat, count", number=1) 4.1984710693359375 – castle-bravo Aug 15 '15 at 03:43
  • @castle-bravo: The formula you posted for the length looks right to me. Please provide a *specific* example where it is wrong. – Kevin Aug 15 '15 at 03:45
  • using the formula that I posted: >>> eratosthenes2(50) Traceback (most recent call last): File "", line 1, in File "", line 9, in eratosthenes2 ValueError: attempt to assign sequence of size 2 to extended slice of size 3 – castle-bravo Aug 15 '15 at 03:48
  • @castle-bravo: OK, I fixed it. – Kevin Aug 15 '15 at 03:52
  • So far my original version is still the fastest. For primes <= 10000000, I have ~3.15s for the original, ~4.2s for itertools.repeat with len(slice), ~3.3s for [False]*formula, and ~4.4s with itertools.repeat with formula. – castle-bravo Aug 15 '15 at 04:02
  • You get the checkmark on account of correcting my formula. – castle-bravo Aug 15 '15 at 04:04
  • Try cProfile. It may be that your slowness is caused by something else. – Kevin Aug 15 '15 at 04:27