1

Is it possible in latest Python's C API to anyhow use all CPU cores?

Because of GIL Python can use only one CPU core at a time, thus performance is low on multi-core machine.

But C API has not-well-documented possibility to have several Interpreters within one C++ program.

Is it possible anyhow by incorporating several interpreters, even one interpreter per each C++ thread, to have separate GIL within each thread/interpreter and thus allow to run every C++ thread using a separate core hence using all 100% CPU performance?

It is said in docs that there is only one single GIL within one program, if I understand correctly, so different interpreters created by Py_NewInterpreter() share same GIL, and all of them can't have separate GIL. It means if I acquire GIL then all other interpreters will be blocked. Maybe I'm wrongly interpreting docs though...

The task is such that inside C++ program in each separate thread I want to execute PyRun_String(...), all threads will not share anything. Each such PyRun_String() is allowed to be run in a separate Interpreter if it will help.

Because all C++ threads don't share anything (hence don't share PyObject * instances) maybe it is possible not to acquire GIL at all? I don't know if global state (global variables) of Python C API need GIL protection or not? Maybe only PyObject * instance need to be protected, hence if C++ threads don't share PyObject* then maybe GIL is not needed to be acquired, does anybody know this?

Of cause I know that it is possible to spawn several processes running this C++ program. But right now I want to understand if task (using all 100% CPU cores) is solvable within one single C++ process.

I was also thinking that it maybe possible through next solution: Python C API is linked via python39.lib, it has some global C variables, these global variables hold state of C Interpreter. Maybe it is possible somehow to link library in such a way that all global variables go into some relocatable region, so that later in each C++ thread I create separate memory region with global variable. Thus every thread will have its copy of global variables, resulting in having totally separated Interpreter state in each thread. But I don't know about any way to make global variables relocatable for single given .lib file, do you know of any ways to do that?

Arty
  • 14,883
  • 6
  • 36
  • 69
  • Have you considered Cython ? https://cython.readthedocs.io/en/latest/src/userguide/parallelism.html It natively supports parallelism via OpenMp and can release the GIL. – Niteya Shah Jan 14 '21 at 13:18
  • @NiteyaShah Yes, I know about Cython. But it is not applicable for my case. I want to use PyRun_String() to run any (even very complex) python scripts. I don't want to rewrite this huge scripts into Cython notition, because it is not that trivial to convert code to NoGIL Cython-based variant. – Arty Jan 14 '21 at 13:22
  • Just use multiple processes. It's not worth trying to run multiple Python instances inside the same one. – tadman Jan 14 '21 at 14:57
  • @tadman That's the main point of my Question. I want first to figure out if it is really not possible to do withing one process. If not, as I said in my question, I'm aware of possibility of using multiple processes. But seems to me that it is quite an easy feature that would be good/desired to have inside CPython - just store all global/static variables inside one structure and allow this structure to be created several times. Then each sub-interpreter will be fully separated (having its own copy of global structure) and thread safe. – Arty Jan 14 '21 at 15:09

2 Answers2

1

Currently, cpython uses one shared GIL for all interpreters. The GIL needs to be held when running python code to protect internal structures. Because of this, python code can not be concurrently executed, even in separate interpreters.

Python 3.10 will have incomplete support for this ([subinterpreters] Meta issue: per-interpreter GIL), but it needs to be enabled at build time with --experimental-isolated-subinterpreters.

Mikel Rychliski
  • 3,455
  • 5
  • 22
  • 29
  • Are you saying it is already available in Py 3.10? If I rebuild it from source with this flag. Because for my C++ program it doesn't matter for me which Python to use, I can use the latest most even from git if it helps. I'm only interested in using Python code from C++ (through Python C API), not interested in using C++ extensions from Python for now. So with Py 3.10 I already can start coding multi-core programs? – Arty Feb 01 '21 at 16:43
  • Do you accidentally know also if any latest CPython is planning to have JIT (Just in Time Compiler)? Because not only to use all cores is crucial for speed, but also to make all static-types optimizations and machine code compiling. I know there is PyPy and also Numba and also Cython, all of which do compile Python code to machine code and do different optimizations. But what about CPython, are there any future plans for this, if you know? – Arty Feb 01 '21 at 16:53
  • Also interesting question regarding having independent sub-interpreters. It is obvious that python code (via `PyRun_SimpleString()`) and all objects (`PyObject*`) within SAME sub-interpreter can interact freely between each other. But what about PyObject-s between different sub-interpreters? Of cause they should be synchronized using mutexes because there will be no longer global GIL, it is obvious, but is it technically allowed at all (if I use mutex) to mix them? If I use `PyList_Append(list, item)` where `list` and `item` are from different sub-interpreters, will that work and allowed? – Arty Feb 01 '21 at 17:35
  • I don't have any experience using this feature, but looking at the issue it appears to be alpha-quality (3.10 hasn't been released, don't know if it's planned to be done by then). You mentioned you aren't using any C++ extensions, you will need to check any Python dependency because these probably won't work with the feature when it's done. If you clone the latest from git you could try it out though. – Mikel Rychliski Feb 02 '21 at 03:24
  • The docs for sub-interpreters say you can mix objects between interpreters, but vaguely warn against it. Don't know if this will change once the global state is isolated. No idea about JIT. Only read about sub-interpreters here [Subinterpreter support for Python](https://lwn.net/Articles/754162/) and [Subinterpreters for Python](https://lwn.net/Articles/820424/) – Mikel Rychliski Feb 02 '21 at 03:26
  • There is [PyThreadState_Swap()](https://docs.python.org/3.10/c-api/init.html#c.PyThreadState_Swap) in the current design (3.9) of Python. It means that there is some single global variable holding pointer to current thread and current interpreter. In isolated interpreters design this should be something totally different. Because there can't be some global thread state - each core should have its state as it may run different thread. Probably `current_thread` variable that is global now should dissappear at all to support isolated interpreters, each thread and core should run its own interpret – Arty Feb 02 '21 at 03:43
  • Do you know where it is described how to use all these functions in new design? Basically where I can read [this document](https://docs.python.org/3.10/c-api/init.html) but in new design with support of isolated sub-interpreters? Do you know what can I read to start coding using this new features even if they are still experimental? – Arty Feb 02 '21 at 03:46
0

Is it possible in latest Python's C API to anyhow use all CPU cores?

No, not easily. I assume a Linux system (adapt my answer to your OS).

You could either code some extension in C using Pthreads or in C++ something using C++ threads, or run several Python processes (e.g. communicating with unix(7) sockets or fifo(7)...)

Notice that Python implementation is open-source software. You are allowed to study it and to improve it to achieve your goals.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Thanks! Do you know such question - if all my C++ threads don't share any PyObject* objects at all, do I still need to acquire GIL always in each thread? In other words my question is whether GIL locking is needed only to protect PyObject* instances, or it is also needed for some internal state tables of Python C API? Are these internal tables (global variables) protected anyhow, are they thread safe? Or GIL should be acquired in 100% of times no matter what is shared between threads? – Arty Jan 14 '21 at 13:07
  • Python implementation is open source. So study its source code..... Perhaps improve it to achieve your goals. Budget several months of full time work – Basile Starynkevitch Jan 14 '21 at 13:08