Sieve of Eratosthenes set-implementation confusion

Question

I wanted to first preface that I'm a python-newbie and that I'm gracious for anyone who can explain it to my clearly and completely.

I was looking at the code found in the link below:

http://rosettacode.org/wiki/Sieve_of_Eratosthenes#Python

I've just begun to understand iterators, generators and the yield command but I don't understand how the code works for the set implementation.

def eratosthenes2(n):
    multiples = set()
    for i in range(2, n+1):
        if i not in multiples:
            yield i
            multiples.update(range(i*i, n+1, i))

I'm having difficulty understanding what the last line in this function does.

Additionally, can someone explain to me why this implementation is O(log(n)) time?

Thank you Gregor. I appreciate you taking the time to tell me this and I'll be sure to keep this in mind the next time I'm stuck on a problem — Roger Josh, Dec 25 '15 at 22:40

score 1 · Answer 1 · answered Oct 20 '15 at 01:21

1

The last line:

multiples.update(range(i*i, n+1, i))

Adds all the multiples of i from the square of i up to n to the set multiples. Any multiple below the square of i will already be in the set from an earlier i.

Rosetta doesn't say the algorithm is O(log(n)), it certainly isn't but just that set lookup is O(log(n)) vs list O(n). The reason is that sets use hashing as means of looking up and is actually on average O(1) vs. O(n)

answered Oct 20 '15 at 01:21

AChampion

29,683
4
59
75

Python `set` lookup is `O(1)`, not `O(log n)`; it's based on hashing, not binary trees or anything. In practice, it's usually a little more than a single check thanks to collision chaining, but it's not tied to the size of the `set` in a meaningful way. – ShadowRanger Oct 20 '15 at 01:24
I thought I covered this in my last sentence, did I miss something? – AChampion Oct 20 '15 at 01:26
The first sentence in that paragraph implies that expected time would be `O(log n)`. The error is on Rosetta's side; sorry for the confusion. – ShadowRanger Oct 20 '15 at 01:27
Agreed, Rosetta is overstating the cost of set lookup (on average), a better source is: https://wiki.python.org/moin/TimeComplexity – AChampion Oct 20 '15 at 01:29

mooiamaduck · Accepted Answer · 2015-10-20T01:43:41.337

The expression range(i, j, k) produces a list of the integers from i to j (the j is non-inclusive, so the inclusive bound is j-1), at intervals of k (which is 1 by default). So range(2, 10, 2) produces the list [2, 4, 6, 8].

What the last line is doing is inserting all multiples of i from i² to n to the set multiples. We start at i² because i is a prime number (since it was not found in the sieve), and the next smallest multiple of i not in multiples is i × i. Proof: if the next smallest multiple of i were a value equal to c × i for some c where 1 < c < i, then we already would have filtered it out in the sieve. We end the range at n+1 because that's where the sieve ends (the 1 makes up for the fact that the end bound is non-inclusive). And of course our interval is set to i to produce its multiples.

The bit about O(log(n)) refers to the time complexity of testing set membership in common set implementations, not to the full algorithm. The complexity of the whole algorithm cannot be less than O(n), since the outer loop runs n-1 times (from 2 to n). Actually, the set membership test takes O(1) time since Python sets are hash tables. Alternatively you could use a list of n bools, which would have better performance at the cost of space.

Thank you so much! I appreciate you taking the time to helping me understand — Roger Josh, Oct 20 '15 at 04:45

Sieve of Eratosthenes set-implementation confusion

2 Answers2