Is it possible in latest Python's C API to anyhow use all CPU cores?
Because of GIL Python can use only one CPU core at a time, thus performance is low on multi-core machine.
But C API has not-well-documented possibility to have several Interpreters within one C++ program.
Is it possible anyhow by incorporating several interpreters, even one interpreter per each C++ thread, to have separate GIL within each thread/interpreter and thus allow to run every C++ thread using a separate core hence using all 100% CPU performance?
It is said in docs that there is only one single GIL within one program, if I understand correctly, so different interpreters created by Py_NewInterpreter() share same GIL, and all of them can't have separate GIL. It means if I acquire GIL then all other interpreters will be blocked. Maybe I'm wrongly interpreting docs though...
The task is such that inside C++ program in each separate thread I want to execute PyRun_String(...), all threads will not share anything. Each such PyRun_String() is allowed to be run in a separate Interpreter if it will help.
Because all C++ threads don't share anything (hence don't share PyObject * instances) maybe it is possible not to acquire GIL at all? I don't know if global state (global variables) of Python C API need GIL protection or not? Maybe only PyObject * instance need to be protected, hence if C++ threads don't share PyObject* then maybe GIL is not needed to be acquired, does anybody know this?
Of cause I know that it is possible to spawn several processes running this C++ program. But right now I want to understand if task (using all 100% CPU cores) is solvable within one single C++ process.
I was also thinking that it maybe possible through next solution: Python C API is linked via python39.lib
, it has some global C variables, these global variables hold state of C Interpreter. Maybe it is possible somehow to link library in such a way that all global variables go into some relocatable region, so that later in each C++ thread I create separate memory region with global variable. Thus every thread will have its copy of global variables, resulting in having totally separated Interpreter state in each thread. But I don't know about any way to make global variables relocatable for single given .lib
file, do you know of any ways to do that?