7

This is risky business, and I understand the Global Interpreter Lock to be a formidable foe of parallelism. However, if I'm using NumPy's C API (specifically the PyArray_DATA macro on a NumPy array), are there potential consequences to invoking it from multiple concurrent threads?

Note that I will still own the GIL and not be releasing it with NumPy's threading support. Also, even if NumPy makes no guarantees about thread safety but PyArray_DATA is thread-safe in practice, that's good enough for me.

I'm running Python 2.6.6 with NumPy 1.3.0 on Linux.

ide
  • 19,942
  • 5
  • 64
  • 106

1 Answers1

7

Answering my own question here, but after poking into the source code for NumPy 1.3.0, I believe the answer is: Yes, PyArray_DATA is thread-safe.

  1. PyArray_DATA is defined in ndarrayobject.h:

    #define PyArray_DATA(obj) ((void *)(((PyArrayObject *)(obj))->data))
    
  2. The PyArrayObject struct type is defined in the same file; the field of interest is:

    char *data;
    

    So now, the question is whether accessing data from multiple threads is safe or not.

  3. Creating a new NumPy array from scratch (i.e., not deriving it from an existing data structure) passes a NULL data pointer to PyArray_NewFromDescr, defined in arrayobject.c.

  4. This causes PyArray_NewFromDescr to invoke PyDataMem_NEW in order to allocate memory for the PyArrayObject's data field. This is simply a macro for malloc:

    #define PyDataMem_NEW(size) ((char *)malloc(size))
    

In summary, PyArray_DATA is thread-safe and as long as the NumPy arrays are created separately, it is safe to write to them from different threads.

ide
  • 19,942
  • 5
  • 64
  • 106