In my Python C Extension I am performing actions on an iterable of strings. So in a first step I call PySequence_Fast
to convert it to a list and then iterate over the elements. For each string I use PyUnicode_DATA
and then compare the strings using some criteria. So I only read from PyObjects, but never modify them.
Now I would like to process the list in parallel, which would require me to release the GIL. However I do not know which effects this has on my use case. Here are my current thoughts:
I can still use those APIs, since they are only macros, that directly read from the PyObjects without modifying them.
I have to use the APIs beforehand and store a array of structs that hold
kind
,length
anddata pointer
of the stringsI have to use the APis beforehand and have to store a copy of the strings in a array
Case 1 would be the most performant and memory efficient. However it is stated, that without acquiring the GIL it is not allowed to perform on Python objects (does this include reading access) or use Python/C API functions.
Case 2 would be the next most efficient, since at least I do not have to copy all strings. However when I am not allowed to read from Python objects while the GIL is released, I wonder whether I would even be allowed to use a pointer to the data inside the PyObject.
Case 3 would require me to copy all strings. In my case this might make the multithreaded solution slower than a sequential solutions.
I hope someone can help me understand what I am allowed to do while the GIL is released.