I am writing a Python script, extended with C for computationally intensive portions of code.
Most of the data passed between Python and C is contained in NumPy arrays. These arrays are attributes of some object dataobject
. In addition, some constants and parameters are also provided to the C routines. These are collected in dict
attributes of another object, dictobject
.
Several calculations are performed this way:
- pre-allocate NumPy arrays in Python, as attributes of
dataobject
- pass
dataobject
anddictobject
to C for calculations, writing into above-mentioned arrays - check/visualise
dataobject
data in Python
The first call to one of my C functions (a calculation) runs without any problems. I can manipulate the pre-allocated NumPy arrays containing the data generated in C in Python without encountering errors.
After my second call to C (another calculation), all hell breaks loose:
TypeError: a float is required in
File "/usr/lib/python2.7/threading.py", line 1160, in currentThread
return _active[_get_ident()]
This error is traced back to the following line, practically the next action in my Python script:
logger.info("some constant string ... ")
This looks very magic. What does and error thrown by _get_ident()
imply? What does a call to the logging
module have to do with anything!?
After some testing, it seems that most operations requiring some form of iteration over anything (for
, reduce
, etc.) will result in the same TypeError: a float is required
... EXCEPT if, before continuing, I iterate over ANY slice of ANY of the arrays I have sent to C, even ones that I had sent in my previous calculation (also part of dataobject
)! In this case, execution proceeds normally: the data in dataobject
and dictobject
is completely correct.
The "caller" (Python) code for every such calculation has an equivalent of the following structure:
def mycalculation(dataobject, dictobject):
dataobject.arr1 = np.zeros(size, dtype=float, order='C')
# ...
dataobject.arrn = np.zeros(size, dtype=float, order='C')
dictobject.dict1 = {'size' = 10, 'some_parameter' = 2 }
# etc, for multiple dict attributes containing multiple parameters
my_c_module.py2cfunction(dataobject, dictobject)
# wrapper for calculations in C
return None
The "wrapper" (C/C++) code for every such calculation has the following structure:
static PyObject* py2c_function( PyObject *self, PyObject *args ) {
PyObject* py_dataobj = NULL ;
PyObject* py_dictobj = NULL ;
if (!PyArg_ParseTuple(args, "OO",
&py_dataobj,
&py_dictobj ))
return NULL ;
my_datastruct data ;
/* typedefed */
if (!get_datastruct(py_dataobj, &data) == ERR_SUCCESS)
{
Py_INCREF(Py_None) ;
return Py_None ;
}
/* get some arrays - attributes of dataobject (py_dataobj) - in a struct
(as C arrays, using PyArray_DATA) */
my_dictstruct dict ;
/* typedefed */
if (!get_dictstruct(py_dictobj, &dict) == ERR_SUCCESS)
{
Py_INCREF(Py_None) ;
return Py_None ;
}
/* get some entries of dicts - attributes of dictobject (py_dictobj) - in a struct
(as C types, e.g. using PyDict_GetItem, then PyFloat_asDouble) */
calculation_func(&data, &dict) ;
/* send to pure C CODE (no more Python-C-API magic in the rest of the code) */
Py_INCREF(Py_None) ;
return Py_None ;
}
Where the functions get_*struct
access attributes of py_*obj
and store them in my_*struct
.
The program breaks clearly right after the second call to C, although it is exteremely similar in structure to the first - could this be the result of a memory leak in my C libraries? Using Py_REFCNT
, it seems the reference counting checks out for py_dictobj
and py_dataobj
...
How can my C extension break _get_ident()
, which seems to refer to thread.get_ident()
from import thread
(see here)? Could you help me narrow down the possible causes of this mysterious and frustrating problem?