Why is Pypy's deque so slow?

Question

Here is a (slightly messy) attempt at Project Euler Problem 49.

I should say outright that the deque was not a good choice! My idea was that shrinking the set of primes to test for membership would cause the loop to accelerate. However, when I realised that I should have used a set (and not worry about removing elements), I got a 60x speed-up.

from collections import deque
from itertools import permutations
from .sieve import sieve_of_erastothenes  # my own implementation of the Sieve of Erastothenes

primes = deque(prime for prime in sieve_of_erastothenes(10000) if prime > 1000 and prime != 1487)  # all four-digit primes except 1487
try:
    while True:
        prime = primes.popleft()  # decrease the length of primes each time to speed up membership test
        for inc in xrange(1,10000 + 1 - (2 * prime)):  # this limit ensures we don't end up with results > 10000
            inc1 = prime + inc
            inc2 = prime + 2*inc

            if inc1 in primes and inc2 in primes:
                primestr = str(prime)
                perms = set(''.join(tup) for tup in permutations(primestr))  # because permutations() returns tuples
                inc1str = str(inc1)
                inc2str = str(inc2)
                if inc1str in perms and inc2str in perms:
                    print primestr + inc1str + inc2str
                    raise IOError  # I chose IOError because it's unlikely to be raised
                                   # by anything else in the block. Exceptions are an easy
                                   # way to break out of nested loops.
except IOError:
    pass

Anyway, before I thought to use a set, I tried it out in Pypy. I found the results to be rather suprising:

$ time python "problem49-deque.py"
296962999629

real    1m3.429s
user    0m49.779s
sys 0m0.335s

$ time pypy-c "problem49-deque.py"
296962999629

real    5m52.736s
user    5m15.608s
sys 0m1.509s

Why is Pypy over five times slower on this code? I would guess that Pypy's version of the deque is the culprit (because it runs faster on the set version), but I have no idea why that is.

Thanks for asking this! I was just about to post a question asking why the deque version of my code is 28% slower than the list version. — Elliot Gorokhovsky, Jun 27 '15 at 21:59

score 18 · Accepted Answer · answered Nov 16 '12 at 17:48

18

The slow part is inc1 in primes and inc2 in primes. I'll look at why PyPy is so slow (thanks for the performance bug report, basically). Note that as you mentioned the code can be made incredibly faster (both on PyPy and on CPython) --- in this case, just by copying the primes deque into a set just before the for loop.

answered Nov 16 '12 at 17:48

Armin Rigo

12,048
37
48

1

+1, but I haven't accepted the answer because it doesn't yet answer the question. If you find out what the issue is, please could you report back to me? I'd love to know what's causing it. – Benjamin Hodgson Nov 16 '12 at 18:10
1

The official bug tracker moved: https://bitbucket.org/pypy/pypy/issues/1327 (of course it has been fixed since forever now.) – Armin Rigo May 29 '16 at 16:15

score 0 · Answer 2 · answered Nov 16 '12 at 18:21

0

You should expect membership tests in a deque (with the python performance characterisitics) to be slow, because any membership test in a list involves a linear scan. By contrast, set is a datastructure optimised for membership tests. In that sense, there is no bug here.

answered Nov 16 '12 at 18:21

Marcin

48,559
18
128
201

2

My question was about the difference in speeds between CPython's `deque` and Pypy's `deque`. I agree (see the question) that a `set` was the right choice of data structure in this particular case and a `deque` was not. – Benjamin Hodgson Nov 16 '12 at 18:23
@poorsod Right, but your question is "why does an inappropriate datastructure perform poorly". The answer is that it is inappropriate, and that that was knowable in advance. It is good that the CPython membership test code is highly optimised, but it is not bad that the CPython code is not, because this is not a datastructure which is suitable where many such tests are required. – Marcin Nov 16 '12 at 18:26
2

I was curious as to the exact reason that Pypy's membership test is _so much_ slower than CPython's. If you feel the question was unclear on that point I'll edit it. – Benjamin Hodgson Nov 16 '12 at 18:30
@poorsod It's not that it's unclear. It's that you can't simply insist that people only comment on the parts of your question that you want to limit it to, for whatever reason. – Marcin Nov 16 '12 at 18:43
1

I think @poorsod 's question is legit, and it's the only reason I'm watching this thread. I get, and I think he gets, that set is the way to go. But as one who is interested in the evolution of PyPy and dynamic JIT implementations in general, it's often very revealing about the engine itself to diagnose why a given behavior exhibits itself. That's where the true nugget of interest is here. Whether he should blackmail an answer with that desire is another question I guess. And maybe that's all you're trying to drive home – Travis Griggs Jan 03 '13 at 00:46

Why is Pypy's deque so slow?

2 Answers2