Optimizing for PyPy

Question

(This is a follow-up to Statistical profiler for PyPy)

I'm running some Python code under PyPy and would like to optimize it.

In Python, I would use statprof or lineprofiler to know which exact lines are causing the slowdown and try to work around them. In PyPy though, both of the tools don't really report sensible results as PyPy might optimize away some lines. I would also prefer not to use cProfile as I find it very difficult to distil which part of the reported function is the bottleneck.

Does anyone have some tips on how to proceed? Perhaps another profiler which works nicely under PyPy? In general, how does one go about optimizing Python code for PyPy?

score 6 · Answer 1 · edited Oct 21 '20 at 10:52

If you understand the way the PyPy architecture works, you'll realize that trying to pinpoint individual lines of code isn't really productive. You start with a Python interpreter written in RPython, which then gets run through a tracing JIT which generates flow graphs and then transforms those graphs to optimize the RPython interpreter. What this means is the layout of your Python code being run by the RPython interpreter being JIT'ed may have very different structure than the optimized assembler actually be run. Furthermore, keep in mind that the JIT always works on a loop or a function, so getting line-by-line stats is not as meaningful. Consequently, I think cProfile may really be a good option for you since it will give you an idea of where to concentrate your optimization. Once you know which functions are your bottlenecks, you can spend your optimization efforts targeting those slower functions, rather than trying to fix a single line of Python code.

Keep in mind as you do this that PyPy has very different performance characteristics than cPython. Always try to write code in as simple a way as possible (that doesn't mean as few lines as possible btw). There are a few other heuristics that help such as using specialized lists, using objects over dicts when you have a small number of mostly constant keys, avoiding C extensions using the C Python API, etc.

If you really, really insist on trying to optimize at the line level. There are a few options. One is called JitViewer (https://foss.heptapod.net/pypy/jitviewer), which will let you have a very low level view of what the JIT is doing to your code. For instance, you can even see the assembler instructions which correspond to a Python loop. Using that tool, you can really get a sense for just how fast PyPy will behave with certain parts of the code, as you can now do silly things like count the number of assembler instructions used for your loop or something.

My suggestion was to just use cProfile, given how PyPy actually works. My second suggestion was to use JitViewer, if you really need a low level understanding of the performance characteristics of your code. — jlund3, Oct 21 '13 at 22:17
The flow graphs transformations are applied when compiling and optimizing the RPython code implementing the interpreter itself, *not* your Python code. Those transformations are the equivalent of the transformations the C compiler applies to the C code of the CPython interpreter, and have nothing to do with the end user's Python code. PyPy's JIT does present the kind of problem you're talking about though; hot loops are likely to be JIT-compiled to multiple different blocks of assembly code, which makes mapping the performance back to individual Python statements tricky. — Ben, Dec 04 '13 at 06:58

Optimizing for PyPy

1 Answers1

Linked