PyPy: Severe performance penalty when using None in a list with integers

Question

Because an algorithm I want to implement uses indices 1..n and because it's very error prone to shift every index by one, I decided to get smart and inserted a dummy element in the beginning of every list, so I can use original formulas from the paper.

For the sake of shortness, consider this toy example:

def calc(N):
    nums=[0]+range(1,N+1)
    return sum(nums[1:]) #skip first element

However, I got worried, that my results are spurious, because I could access the 0-th element by accident somewhere and not be aware of it. So I got even smarter and used None instead of 0 as the first element - every arithmetical operation with it would result in a runtime error:

def calc_safe(N):
    nums=[None]+range(1,N+1) #here we use "None"
    return sum(nums[1:])

Surprisingly, this small change led to a huge performance penalty for pypy (even with the current 5.8-version) - the code became around 10 times slower! Here are the times on my machine:

                    pypy-5.8    cpython
calc(10**8)         0.5 sec     5.5 sec
calc_safe(10**8)    7.5 sec     5.5 sec

As a side node: Cpython does not care, whether Noneis used or not.

So my question is twofold:

Obviously using None is not a good idea, but why?
Is it possible to get the safety of the None-approach and keep the performance?

Edit: As Armin has explained, not all lists are equal, and we can see, which strategy is used via:

import __pypy__ 
print __pypy__.strategy(nums)

In the first case it is IntegerListStrategy and in the second ObjectListStrategy. The same would happen if we used a large integer value (like 2**100) instead of None.

score 4 · Accepted Answer · answered Jul 11 '17 at 07:53

4

PyPy has got a special case for lists containing only integers---it stores them like an array.array. If there is a None in it, then this optimization no longer works.

This could probably be fixed inside PyPy to allow None as a special case...

answered Jul 11 '17 at 07:53

Armin Rigo

12,048
37
48

It is somehow possible to see explicitly, whether the optimized or the non-optimized version of the list is used? – ead Jul 11 '17 at 08:43
Yes, with ``__pypy__.strategy(lst)``. – Armin Rigo Jul 11 '17 at 09:09

PyPy: Severe performance penalty when using None in a list with integers

1 Answers1