How to improve memory intensive Python scripts?

Question

Today, I wrote a short script for a prime sieve, and I am looking to improve on it. I am rather new to python and programming in general, and so I am wondering: what is a good way to reduce memory usage in a program where large lists of numbers are involved? Here is my example script:

def ES(n):
    A = list(range(2, n+1))
    for i in range(2, n+1):
        for k in range(2, (n+i)//i):
            A[i*k-2] = str(i*k)
    A = [x for x in A if isinstance(x, int)]
    return A

This script turns all composites in the list, A, into strings and then returns the list of remaining integers, which are all prime, yet it runs the A[i*k-2] = str(i*k) three times for the number 12, as it goes through all multiples of 2, then 3 and again for 6. With something like that happening, while storing such a large list, I hit a brick wall rather soon and it crashes. Any advice would be greatly appreciated! Thanks in advance.

EDIT: I don't know if this makes a difference, but I'm using Python 3.3

Why are you replacing numbers with string representations of themselves to mark them composite? Why not just keep a big list of booleans representing composite/not composite? — user2357112, Aug 23 '13 at 03:51
well, I'm pretty sure this is O(n^2) so it is pretty inefficient in general, but once i hit about 500,000 it will usually crash. — Plopperzz, Aug 23 '13 at 03:53
I have no formal education in programming until September, so all my programs are pretty much put together any way i can think of to get results. — Plopperzz, Aug 23 '13 at 03:55
@user2357112 I got my idea of converting them into strings from the explanation of the sieve of Eratosthenes — Plopperzz, Aug 23 '13 at 03:57

score 2 · Accepted Answer · answered Aug 23 '13 at 04:01

First, you're using a really weird, inefficient way to record whether something is composite. You don't need to store string representations of the numbers, or even the numbers themselves. You can just use a big list of booleans, where prime[n] is true if n is prime.

Second, there's no reason to worry about wasting a bit of space at the beginning of the list if it simplifies indexing. It's tiny compared to the space taken by the rest of the list, let alone all the strings and ints and stuff you were using. It's like saving $3 on paint for your $300,000 car.

Third, range takes a step parameter you can use to simplify your loops.

def sieve(n):
    """Returns a list of primes less than n."""

    # n-long list, all entries initially True
    # prime[n] is True if we haven't found a factor of n yet.
    prime = [True]*n

    # 0 and 1 aren't prime
    prime[0], prime[1] = False, False

    for i in range(2, n):
        if prime[i]:
            # Loop from i*2 to n, in increments of i.
            # In other words, go through the multiples of i.
            for j in range(i*2, n, i):
                prime[j] = False

    return [x for x, x_is_prime in enumerate(prime) if x_is_prime]

Hey, thank you very much. That makes total sense to me now; I was reading a bunch of scripts for prime sieves and they all implemented booleans, and I just had no idea what they were doing. I appreciate the explanations. I would you up if i could! — Plopperzz, Aug 23 '13 at 04:08
@Plopperzz: You're welcome. You should be able to accept my answer by clicking the outline of a check mark next to the vote score. — user2357112, Aug 23 '13 at 04:11

jh314 · Answer 2 · 2013-08-23T04:08:41.790

0

Rather than using large lists, you can use generators.

def ESgen():
    d = {}  
    q = 2   
    while 1:
        if q not in d:
            yield q        
            d[q*q] = [q]   
        else:
            for p in d[q]: 
                d.setdefault(p+q,[]).append(p)
            del d[q]      
        q += 1

To get the list of first n primes using the generator, you can do this:

from itertools import islice

gen = ESgen()
list(islice(gen, n))

If you only want to check if a number is prime, set is faster than list:

from itertools import islice

gen = ESgen()
set(islice(gen, n))

edited Aug 23 '13 at 04:08

answered Aug 23 '13 at 03:55

jh314

27,144
16
62
82

Thanks, I should take a look a better look at generators. – Plopperzz Aug 23 '13 at 04:17

How to improve memory intensive Python scripts?

2 Answers2