Because an algorithm I want to implement uses indices 1..n
and because it's very error prone to shift every index by one, I decided to get smart and inserted a dummy element in the beginning of every list, so I can use original formulas from the paper.
For the sake of shortness, consider this toy example:
def calc(N):
nums=[0]+range(1,N+1)
return sum(nums[1:]) #skip first element
However, I got worried, that my results are spurious, because I could access the 0-th element by accident somewhere and not be aware of it. So I got even smarter and used None
instead of 0
as the first element - every arithmetical operation with it would result in a runtime error:
def calc_safe(N):
nums=[None]+range(1,N+1) #here we use "None"
return sum(nums[1:])
Surprisingly, this small change led to a huge performance penalty for pypy (even with the current 5.8-version) - the code became around 10 times slower! Here are the times on my machine:
pypy-5.8 cpython
calc(10**8) 0.5 sec 5.5 sec
calc_safe(10**8) 7.5 sec 5.5 sec
As a side node: Cpython does not care, whether None
is used or not.
So my question is twofold:
- Obviously using
None
is not a good idea, but why? - Is it possible to get the safety of the
None
-approach and keep the performance?
Edit: As Armin has explained, not all lists are equal, and we can see, which strategy is used via:
import __pypy__
print __pypy__.strategy(nums)
In the first case it is IntegerListStrategy
and in the second ObjectListStrategy
. The same would happen if we used a large integer value (like 2**100
) instead of None
.