8

I have skimmed through the PyPy implementation details and went through the source code as well, but PyPy's execution path is still not totally clear to me.

Sometimes Bytecode is produced, sometimes it is skipped for immediate machine-code compiling (interpreter level/app level code), But I can't figure out when and where exactly is the machine code produced, to be handed to the OS for binary execution through low-level instructions (RAM/CPU).

I managed to get that straight in the case of CPython, as there is a giant switch in ceval.c - that is already compiled - which interprets bytecode and runs the corresponding code (in actual C actually). Makes sense.
But as far as PyPy is concerned, I did not manage to get a clear view on how this is done, specifically (I do not want to get into the various optimization details of PyPy, that's not what I am after here).

I would be satisfied with an answer that points to the PYPY source code, so to avoid "hearsay" and be able to see it "with my eyes" (I spotted the JIT backends part, under /rpython, with the various CPU architectures assemblers)

Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130

1 Answers1

3

Your best guide is the pypy architecture documentation, and the actual JIT documentation.

What jumped out the most for me is this:

we have a tracing JIT that traces the interpreter written in RPython, rather than the user program that it interprets.

This is covered in more detail in the JIT overview.

It seems to be that the "core" is this (from here):

Once the meta-interpreter has verified that it has traced a loop, it decides how to compile what it has. There is an optional optimization phase between these actions which is covered future down this page. The backend converts the trace operations into assembly for the particular machine. It then hands the compiled loop back to the frontend. The next time the loop is seen in application code, the optimized assembly can be run instead of the normal interpreter.

This paper (PDF) might also be helpful.

Edit: Looking at the x86 backend rpython/jit/backend/x86/rx86.py, the backend doesn't so much as compile but spit out machine code directly. Look at the X86_64_CodeBuilder and AbstractX86CodeBuilder classes. One level higher is the Assembler386 class in rpython/jit/backend/x86/assembler.py. This assembler uses the MachineCodeBlockWrapper from rpython/jit/backend/x86/codebuf.py which is based on the X86_64_CodeBuilder for x86-64.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • 1
    Thanks. I have been reading those same pages for hours but I still cannot wrap my head around it. I have the feeling that the precise and specific answer I am looking for is somewhere between those lines, but it seems drowned in a load of dense information. I went down the "trace" road, but all it says is that spotted traces are compiled by the JIT (how/where in the code, who knows...) – Mehdi LAMRANI Aug 30 '20 at 20:52
  • Thank you for your edits. Good material to look at. Looks like there is no way around diving deep in the JIT code. But at least I know where in the code is the actual machine code fabricated. Very insightful. I was wondering what module did actually fire/execute the machine code that is generated, if you happen to know... (would spare me a couple of diving hours, but I'm off to it anyway) – Mehdi LAMRANI Aug 31 '20 at 10:20
  • It would also be cool to have a flow chart of the execution path in PyPy. I crawled through many many docs but unless reading pages and pages of explanations there seems not to be a "one page diagram" to get the big picture – Mehdi LAMRANI Aug 31 '20 at 10:29
  • 1
    @MehdiLAMRANI The generated machine code is executed by the front-end, but I haven't looked into the details. W.r.t. a flowchart, you could try running `pypy` through one of the several profiler modules that exist, e.g. `line-profiler`. That would show you at least the "flow" of the code. – Roland Smith Aug 31 '20 at 16:44