What's causing the 135k/sec page faults in this python code? (trial-division prime sieve)

Question

The following code executes in a work thread, and just happily spins away, receiving pause/report commands, etc. I assume they're soft faults since I'm not having RAM use issues and my hard drive didn't melt over the weekend, and that it's got something to do with how long I've let results accumulate, as I'm only getting ~50 page faults when I started it up a few days ago.

The "counter" attribute is currently at 22,496,115, and "results" has 1,418,641 elements. The slice of "results" is taken because I was feeling contrary and started the list at 1.

def run(self):
    while self.keep_running:
        self.lock.acquire()

        is_prime = True
        self.counter += 1
        cutoff_val = pow(self.counter,.5)
        for number in self.results[1:]:
            if number > cutoff_val:
                break

            if self.counter % number == 0:
                is_prime = False
                break

        if is_prime:
            self.results.append(self.counter)

        self.lock.release()

Note: I know that I could use the Sieve of Eratosthenes to optimize the algorithm and probably cut down on the page faults, but that's not the point: I'm trying to pinpoint the exact reason - or at least the worst offender - behind the page faults so I can avoid doing the same kinds of things in the future. This algorithm is used simply for testing UI responsiveness when I need a "stupidly expensive, simple work thread."

Additional setup pieces as requested:

def __init__(self):
    self.counter = 0
    self.keep_running = False;
    self.lock = threading.Lock()
    self.results = list()

def __call__(self, *args):
    if not self.keep_running:
        self.keep_running = True
        self.run()

Can you post reproducible code? I tried using `self.lock = threading.RLock()` and I don't get any increasing number of page faults using Python 2.7.3 on Windows 8 DP. — AndiDog, Sep 10 '12 at 17:05
@AndiDog : The rest of the setup is added, it justs needs to be stuffed into a class and an instance of the class made the target of threading.Thread. I'm running a new instance of the sieve, and it's not getting any noticeable PF counts so it's definitely just something that's chugging out when it's been running a while. I'm keeping an eye on it. — Redsplinter, Sep 10 '12 at 17:23

steveha · Accepted Answer · 2012-09-10T21:49:16.840

4

I think that @John Gaines Jr. has pointed out something you need to change. If your list is really big, you don't want to be making copies of it like that.

Here is a good way to loop over the same values as self.results[1:] but without making a copy:

res = iter(self.results)  # get an iterator for the list values
next(res)  # iterate once to throw away first value
for number in res:
    # same code you already have goes here
    ...

EDIT: The above code is correct and simple, but doesn't scale well. I thought about it and figured there must be something in itertools for this, and sure enough there is:

import itertools as it
res = it.islice(self.results, 1, None)
for number in res:
    # same code you already have goes here
    ...

EDIT: Thank you to @John Gaines Jr. for pointing out that you can use None instead of len(self.results) in the call to it.islice().

edited Sep 10 '12 at 21:49

answered Sep 10 '12 at 17:26

steveha

74,789
21
92
117

1

Not only is that an awesome answer, but it turns out using the slice directly slowed down the whole process by a ridiculous amount - It's already processed farther in 5 minutes than it did in 3 days. Slicing is definitely to be used with care. – Redsplinter Sep 10 '12 at 17:44
2

There is a shortcut to mean [1:], it is islice(some_iter, 1, None). Mind you, not much shorter. ;) – John Gaines Jr. Sep 10 '12 at 21:21
Wow, thanks for that. That's what I get for just looking at `help(itertools.islice)` and not checking any other sources! Note that, since we are iterating a list, it's easy to call `len()`. But you can use `itertools.islice()` to take a slice from an iterator or generator, and it may not be easy or even possible to compute a length. So shame on me, I should have known there would be an easy solution for this! Here's a link to the online docs: http://docs.python.org/library/itertools.html#itertools.islice – steveha Sep 10 '12 at 21:47

score 3 · Answer 2 · answered Sep 10 '12 at 17:20

From the Python tutorial Lists section:

All slice operations return a new list containing the requested elements. This means that the following slice returns a shallow copy of the list a:

>>> a[:]
['spam', 'eggs', 100, 1234]

So in your for loop, the bit self.results[1:] results in a copy of the results list. If this routine is being called over and over, it could quite easily cause memory thrashing.

What's causing the 135k/sec page faults in this python code? (trial-division prime sieve)

2 Answers2