Finding a Prime Sieve Inconsistency in Python

Question

I'm attempting to learn python and I thought trying to develop my own prime sieve would be an interesting problem for the afternoon. When required thus far, I would just import a version of the Sieve of Eratosthenes that I found online -- it's this that I used as my benchmark.

After trying several different optimizations, I thought I had written a pretty decent sieve:

def sieve3(n):
top = n+1
sieved = dict.fromkeys(xrange(3,top,2), True)
for si in sieved:
    if si * si > top:
        break
    if sieved[si]:
        for j in xrange((si*2) + si, top, si*2):       [****]
            sieved[j] = False
return [2] + [pr for pr in sieved if sieved[pr]]

Using the first 1,000,000 integers as my range, this code would generate the correct number of primes and was only about 3-5x slower than my benchmark. I was about to give up and pat myself on the back when I tried it on a larger range, but it no longer worked!

n = 1,000 -- Benchmark = 168 in 0.00010 seconds
n = 1,000 -- Sieve3 = 168 in 0.00022 seconds

n = 4,194,304 -- Benchmark = 295,947 in 0.288 seconds
n = 4,194,304 -- Sieve3 = 295,947 in 1.443 seconds

n = 4,194,305 -- Benchmark = 295,947 in 3.154 seconds
n = 4,194,305 -- Sieve3 = 2,097,153 in 0.8465 seconds

I think the problem comes from the line with [****], but I can't figure out why it's so broken. It's supposed to mark each odd multiple of 'j' as False and it works most of the time, but for anything above 4,194,304 the sieve is broken. (To be fair, it breaks on random other numbers too, like 10,000 for instance).

I made a change and it significantly slowed my code down, but it would actually work for all values. This version includes all numbers (not just odds) but is otherwise identical.

def sieve2(n):
top = n+1
sieved = dict.fromkeys(xrange(2,top), True)
for si in sieved:
    if si * si > top:
        break
    if sieved[si]:
        for j in xrange((si*2), top, si):
            sieved[j] = False
return [pr for pr in sieved if sieved[pr]]

Can anyone help me figure out why my original function (sieve3) doesn't work consistently?

Edit: I forgot to mention, that when Sieve3 'breaks', sieve3(n) returns n/2.

enumerating the keys of a dict is not guaranteed to be ordered, so change your loop to `for si in sorted(sieved.keys()):` — mtadd, Apr 19 '13 at 23:59
That definitely did the trick.. I guess I'd have to get under the hood to figure out why it fails sporadically. Can I mark a comment as the correct answer? If not, feel free to add an answer and I'll gladly accept. Thanks! — MDd, Apr 20 '13 at 00:02
that was a tricky bug to spot. for optimization, you may want to loop over xrange(3,top,2) for the main sieve loop and the final return array rather than the dictionary keys, as both loops should be ordered — mtadd, Apr 20 '13 at 00:06
This would probably be faster if you use an array instead of a dictionary. Initialize sieved = [True] * top, then the i'th element of the array represents the number 2i+3. — user448810, Apr 20 '13 at 03:19

score 2 · Accepted Answer · answered Apr 20 '13 at 00:10

The sieve requires the loop over candidate primes to be ordered. The code in question is enumerating the keys of a dictionary, which are not guaranteed to be ordered. Instead, go ahead and use the xrange you used to initialize the dictionary for your main sieve loop as well as the return result loop as follows:

def sieve3(n):
    top = n+1
    sieved = dict.fromkeys(xrange(3,top,2), True)
    for si in xrange(3,top,2):
        if si * si > top:
            break
        if sieved[si]:
            for j in xrange(3*si, top, si*2):     
                sieved[j] = False
    return [2] + [pr for pr in xrange(3,top,2) if sieved[pr]]

score 1 · Answer 2 · answered Apr 20 '13 at 00:04

It's because dictionary keys are not ordered. Some of the time, by chance, for si in sieved: will loop through your keys in increasing order. With your last example, the first value si gets is big enough to break the loop immediately.

You can simply use: for si in sorted(sieved):

score 0 · Answer 3 · answered Apr 19 '13 at 23:01

0

Well, look at the runtime -- you see that the runtime on the last case you showed was almost 5 times faster than the benchmark, while it had usually been 5 times slower. So that is a red flag, maybe you aren't performing all of the iterations? (And it is 5 times faster while having almost 10 times as many primes...)

I don't have time to look into the code more right now, but I hope this helps.

answered Apr 19 '13 at 23:01

James

2,635
5
23
30

Yeah, I guess I neglected to mention, when it doesn't work, sieve3(n) = n/2, so it must be skipping a bunch of loops. I still can't figure out why though... – MDd Apr 19 '13 at 23:02

Finding a Prime Sieve Inconsistency in Python

3 Answers3