0

Python C Extensions always release the GIL via Py_BEGIN_ALLOW_THREADS when making IO system calls. I have read mixed opinions on whether C Extensions release the GIL for relatively long running CPU-bound tasks. I've read that CPU inside the python interpreter is subject to GIL but that CPU inside of a C Extension often releases the GIL. I assume this would be safe and efficient to do so as long as the CPU tasks could be performed in a temporary variable with fine-grain locking, and the GIL was acquired before an update to a shared structure takes place.

I searched through the 3.6 C code in Github for all instanced of Py_BEGIN_ALLOW_THREADS. I might have missed some but it seems to me that Python C Extensions in general do not release the GIL for CPU bound tasks.

Do Python C Extensions typically release the GIL for CPU tasks as they do for IO tasks?


Here are the only examples I saw where they did release the GIL for a CPU task:

The _hashlib.h has a macro ENTER_HASHLIB with a comment indicating that it will release the GIL around a CPU consuming hashlib operation.

The sha3module.c appears to allow threads before a CPU operation.

The _lzmamodule.c appears to allow threads before a CPU operation.


I could not find any other CPU calls that released the GIL (I definitely might have missed some, let me know if I did).

For example:

The _bisectmoduel.c has no Py_BEGIN_ALLOW_THREADS.

The _heapqmodule.c has no Py_BEGIN_ALLOW_THREADS.

The _json.c has no Py_BEGIN_ALLOW_THREADS.

The _csv.c has no Py_BEGIN_ALLOW_THREADS.

Matthew Moisen
  • 16,701
  • 27
  • 128
  • 231
  • 1
    With the GIL released, an extension module cannot safely access Python objects, as this would allow another thread to change those objects out from under it. The `json` module, for example, is basically doing nothing other than reading or creating Python objects, there is no period of time during which the GIL could be released that's long enough to bother with. `hashlib`, on the other hand, is mostly just reading from an immutable `str`/`bytes` object, other threads have no way of interfering with that. – jasonharper Aug 24 '19 at 16:30
  • @jasonharper: You should make that an answer, rather than just a comment! – Blckknght Aug 24 '19 at 16:58

1 Answers1

1

Well, numpy does it, although they wrap their own macro around it (NPY_BEGIN_ALLOW_THREADS / NPY_END_ALLOW_THREADS)

According to the docs:

This group is used to call code that may take some time but does not use any Python C-API calls. Thus, the GIL should be released during its calculation.

So you can release the GIL, but only if you do not use any Python C-API calls. Another way of saying that is if you do not access Python objects.

But the code in the last four extension modules you linked is filled with C-API calls. So there is probably not enough calculations done in C in those modules that do not involve C-API calls to make releasing and re-acquiring the GIL worthwhile.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94