1

I need to call a Python function from my C code. It works perfectly, but when I want to do parallelization, it breaks down. Please see the following minimal C code:

#include <Python.h>
#include <stdio.h>

int main(void)
{
  double Z = 1.;
  double k = 1.;
  double l = 1.;
  double eta = -Z/k;

  Py_Initialize();

  PyObject* pName = PyString_FromString("mpmath");
  PyObject* pModule = PyImport_Import(pName);
  PyObject* pFunc = PyObject_GetAttrString(pModule, "coulombf");

  PyObject* pl = PyFloat_FromDouble(l);
  PyObject* peta = PyFloat_FromDouble(eta);

  int i;
#pragma omp parallel for private(i)
  for(i=0; i<10000; i++)
  {
    double r = 0.01*i;
    PyObject* prho = PyFloat_FromDouble(k*r);
    PyObject* pArgs = PyTuple_Pack(3, pl, peta, prho);
    PyObject* pValue = PyObject_CallObject(pFunc, pArgs);
    double value = PyFloat_AsDouble(pValue);
    printf("r=%.2f\tf=%.6f\n",r,value);
  }

  Py_Finalize();
  return 0;
}

Let's name this file as testPython.c, you can compile it with gcc -fopenmp testPython.c -o testPython -I/usr/include/python2.7 -L/usr/lib64/python2.7/config -lpython2.7.

Now run it with ./testPython, you with see such error: Fatal Python error: GC object already tracked. (Sometimes, the error info differs.)

But if you compile it leaving -fopenmp out, the program works perfectly.

How can I overcome this problem? Thanks!

Edit:

As answered by Natecat, John Bollinger, and Olaf, multithreading is unlikely to speed up the process much, but multiprocessing can really speed up the computation. The pure python script is as simple as following:

import numpy
from mpmath import coulombf
from multiprocessing import Pool

Z = 1.
k = 1.
l = 1.
eta = -Z/k

def coulombF(r):
    return coulombf(l,eta,k*r)

pool = Pool(12)
result = pool.map_async(coulombF, numpy.arange(0.,100.,0.01))
print(result.get())

But how do I do it in C? I haven't found the way yet.

Hongcheng Ni
  • 427
  • 1
  • 4
  • 11

2 Answers2

4

@Natecat's answer is basically right, if a bit lacking in detail and nuance. The docs of Python's C API give a more complete picture. Supposing that this is the Python implementation you are using, you need to be aware of the following:

The Python interpreter is not fully thread-safe. In order to support multi-threaded Python programs, there’s a global lock, called the global interpreter lock or GIL, that must be held by the current thread before it can safely access Python objects. Without the lock, even the simplest operations could cause problems in a multi-threaded program [...].

Therefore, the rule exists that only the thread that has acquired the GIL may operate on Python objects or call Python/C API functions. In order to emulate concurrency of execution, the interpreter regularly tries to switch threads (see sys.setswitchinterval()). The lock is also released around potentially blocking I/O operations like reading or writing a file, so that other Python threads can run in the meantime.

and

when threads are created from C (for example by a third-party library with its own thread management), they don’t hold the GIL, nor is there a thread state structure for them.

Note: this is exactly the case with OpenMP.

If you need to call Python code from these threads [...] you must first register these threads with the interpreter by creating a thread state data structure, then acquiring the GIL, and finally storing their thread state pointer, before you can start using the Python/C API. When you are done, you should reset the thread state pointer, release the GIL, and finally free the thread state data structure.

The PyGILState_Ensure() and PyGILState_Release() functions do all of the above automatically. The typical idiom for calling into Python from a C thread is:

PyGILState_STATE gstate;
gstate = PyGILState_Ensure();

/* Perform Python actions here. */
result = CallSomeFunction();
/* evaluate result or handle exception */

/* Release the thread. No Python API allowed beyond this point. */
PyGILState_Release(gstate);

You must implement that pattern to allow multiple OpenMP threads safely to make concurrent calls into the same CPython interpreter, but you are unlikely to get much benefit from the parallelization, as the various OpenMP threads will largely be prevented from running concurrently.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Actually your answer shows that it is very well possible. The actual question is if an application benefits from it. For I/O-heavy applications or if using one/few processing and multiple IO threads this can be a good application. – too honest for this site Mar 29 '16 at 22:33
  • Wouldn't this restrict the python code to 1 thread at a time, defeating the purpose of parallelization? – Natecat Mar 29 '16 at 23:00
  • @Natecat, as I wrote in the answer, "you are unlikely to get much benefit from the parallelization, as the various OpenMP threads will largely be prevented from running concurrently." That's not quite the same thing as having only one Python thread at a time, however. The interpreter will switch among threads, and you can have concurrency during I/O and certain other operations. If such operations are few, as I'm guessing is the case here, then the effective concurrency is low. If such operations are many or long, however, then effective concurrency may be high. – John Bollinger Mar 30 '16 at 02:40
  • Ahh sorry, I missed that part of your answer – Natecat Mar 30 '16 at 04:04
  • @Olaf How to actually implement the multiprocessing in C? – Hongcheng Ni Mar 31 '16 at 09:41
  • @Natecat How to actually implement the multiprocessing in C? – Hongcheng Ni Mar 31 '16 at 09:41
  • @JohnBollinger How to actually implement the multiprocessing in C? – Hongcheng Ni Mar 31 '16 at 09:41
  • @HongchengNi, my answer already explains what your code must do if you want multiple concurrently-running threads safely to all call into the same Python interpreter. It also explains that CPython makes the actual concurrency that you can expect pretty small. This is a characteristic of CPython, not of C. You cannot solve it without ditching CPython for either a Python implementation that supports true concurrency of Python bytecode execution, or for a non-Python implementation of your computation. – John Bollinger Mar 31 '16 at 15:41
2

Any type of true multithreading (E.G. using multiple system threads in one process) in python is not possible, at least in the most common python implementations. You can either not use any type of paralellization or you can switch to an implementation without a GIL. Here is an article with more information on the subject: https://wiki.python.org/moin/GlobalInterpreterLock

Natecat
  • 2,175
  • 1
  • 17
  • 20
  • This is wrong. See the `threading` and `multiprocessing` modules. The GIL actually exists to allow multiprocessing without internal contention in the interpreter. – too honest for this site Mar 29 '16 at 21:35
  • 1
    Threading only allows simulated concurrency, no true multithreading is happening. Multiprocessing is NOT multithreading and has much more significant overhead. I'll change my answer to clarify this – Natecat Mar 29 '16 at 21:36
  • It very well is some kind of multithreading. A thread need not be not a seperate process. – too honest for this site Mar 29 '16 at 21:37
  • I would not say this is "multithreading in python", because the parallelization is done in C, each thread creates a new Python object. Is this impossible as well? – Hongcheng Ni Mar 29 '16 at 21:38
  • I believe since they are all in one process, the same interpreter would be used for all of them. You could create many different processes to run multiple interpreters at the same time, but this would create much more overhead than the multithreading you are trying to do – Natecat Mar 29 '16 at 21:40
  • @Olaf, the GIL of the C Python implementation indeed does prevent multiple threads from executing Python bytecode concurrently. Cython allows limited concurrency by releasing the GIL at times, such as while performing I/O. – John Bollinger Mar 29 '16 at 21:40
  • @JohnBollinger: AFAIK this is also true for the most common CPython. BUt not sure. At least `multiprocessing` simulates true parallelism. And e.g. on Linux, the increased overhead is often acceptable (not sure if Windows still fails at that). – too honest for this site Mar 29 '16 at 21:44
  • @Olaf Multiprocessing has nothing to do with multithreading. It creates processes, not threads. Processes and threads are fundamentally different, and the overhead is much much greater on processes than threads. – Natecat Mar 29 '16 at 21:46
  • @Natecat Interesting. Does this mean, say if I have 12 cores, I can simply create 12 processes using `multiprocessing`, and split the task to 12 processes? Can I create 12 processes only once at the beginning, such that the overhead is small? – Hongcheng Ni Mar 29 '16 at 21:54
  • 1
    It isn't guaranteed that the system will run all of the processes on different cores but by having 12 processes you make it possible that it might utilize all 12 cores. You can in fact use the multiprocessing module like that, and in order to not have the overhead of creating a new process each time, simply include in the code of the new process some way for it to move on to the next set of data instead of terminating after finishing with it's initial data set. Also, if my answer is satisfactory, can you click the check mark under the voting symbols to accept my answer? – Natecat Mar 29 '16 at 21:57
  • Sorry, but I don't see where you answer the question justifying accepting. The answer by @JohnBollinger otoh gives much more relevant information. – too honest for this site Mar 29 '16 at 22:28
  • @Natecat: I very well know about the differences. Still this module allows a higher degree of concurrency if you can accept the drawbacks and modifications. Whether this is acceptable for a specific application is a design-choice and should be carefully evaluated. – too honest for this site Mar 29 '16 at 22:31