I'm trying to covert a 2D C-array to a 2D Python List. But I'm getting segmentation fault when the code tries to call PyList_New(). Size of rows and cols is usually fixed at 50. This happens at a specific index idxR=39. PyList_New() allocates properly upto 38th index, and always crashes at 39th index. I'm not sure how to debug this extremely weird behaviour.
Get2DList is called repeatedly approximately 5 times with different data values, and it always fails the 5th time at index idxR = 39
Here's the code for reference
/*
Function to convert a C 2D array into a PythonAPI 2D List object
This object is passed onto the python interpretor,
* which then passes it onto the appropriate function.
* The PythonAPI List object is converted to a python list by the interpretor.
*/
PyObject* Get2DList (double** arr,size_t rows, size_t cols)
{
PyObject* pyList2D=PyList_New(0);
PyObject* tmpLst = NULL;
for (int idxR=0; idxR<rows ; idxR++)
{
tmpLst=PyList_New(0) ;
for (int idxC=0; idxC<cols; idxC++)
{
PyList_Append (tmpLst, PyFloat_FromDouble (arr[idxR][idxC])) ;
}
PyList_Append (pyList2D, tmpLst) ;
Py_XDECREF (tmpLst);
}
return pyList2D;
}
int predict(double** v10,double** pvo,double** avo,double** rh,double** slp)
{
//Convert C data parameters to python for tranferring to Keras model
PyObject* py_V10=Get2DList(v10,ROI_SIZE,ROI_SIZE);
PyObject* py_PVO=Get2DList(pvo,ROI_SIZE,ROI_SIZE);
PyObject* py_AVO=Get2DList(avo,ROI_SIZE,ROI_SIZE);
PyObject* py_RH=Get2DList(rh,ROI_SIZE,ROI_SIZE);
PyObject* py_SLP=Get2DList(slp,ROI_SIZE,ROI_SIZE);
}
I tried attaching GDB, and get the stack trace but I'm unable to exactly understand the issue.
Here's the stack trace for reference:
#3
0x000015555109320 in malloc_printerr(str=str@entry=0×1555511be4c1"free():invalidpointer")atmalloc.c:5347
#4
0x000015555109fb5c in
#5
_int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4173
0x00001554910322d4 in google::protobuf::Map<std::
-cxx11::basic_string<char, std:: char_traits<char>, std::allocator<char> >, tensorf
low::AttrValue>::~Map() () from /home/./.local/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#6
0x000015548b3081e5 in tensorflow::FunctionDef::~FunctionDef()
from /home/./.local/lib/python3.8/site-packages/tensorflow/python/../libtensorflow_framework.so.2
#7
0x0000155490cf26b3 in TF DeleteFunction ()
from /home/./.local/lib/python3.8/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#8 0x0000155527e8cd4f in pybind11:: cp_function::initialize<void (*&) (TF_Function*), void, TF_Function*, pybind11::name, pybind11::scop e, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release> >(void (*&) (TF_Function*), void (*) (TF_Function*), pybind11::na me const&, pybind11::scope const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&):: flambda(pybind1
1::detail::function_ call}::_ FUN(pybind11::detail::function_call&) (
#9
from /home/./.local/lib/python3.8/site-packages/tensorflow/python/client/_pywrap_tf_session.so
0x0000155527e8fb2f in pybind11::cp_function::dispatcherl_object*, _object*, _object*) () from /home/./.local/lib/python3.8/site-packages/tensorflow/python/client/_pywrap_tf_session.so
#10 0X0000155555065748 in ?? () from /lib/X86_64-1inux-gnu/libpython3.8.so.1.0 #11 0x0000155555065b26 in
_PyObject_MakeTpCall () from /lib/X86_64-1inux-gnu/libpython3.8.so.1.0
#12 0X0000155554631df3 in 7? () from /]ib/×86_64-1inux-gnu/libpython3.8.so.1.0
#13 0x0000155554639ef6 in _PyEval_EvalFrameDefault () from /lib/×86_64-linux-gnu/libpython3.8.so.1.0
#14 0X000015555403d06b in ?? () from /1ib/×86_64-1inux-gnu/libpython3.8.so.1.0 #15 0×0000155555000c36 in
??
() from /lib/X86_64-1inux-gnu/libpython3.8.so.1.0
#16 0X0000155554eebce in ?? () from /lib/×86_64-1inux-gnu/libpython3.8.so.1.0
#17 0x0000155554f29994 in _Pyobject_GC_Malloc()from/lib/×86_64-linux-gnu/libpython3.8.so.1.0
#18 0x0000155554f29c47 in
_PyObject_GC_New () from /lib/X86_64-1inux-gnu/libpython3.8.so.1.0
#19 0X0000155555044f4c in PyList_New () from /lib/X86_64-linux-gnu/libpython3.8.so.1.0
#20 0×0000555555557819 in Get2DList (arr=0×5555599d4c30, rows=50, cols=50) at model.c:22 #21 0x0000555555557a89 in predict (v10=0×55555 f229190, pvo=0×55555a0b8990, avo=0×55555a0bbce, rh=0x555559dc4b40, sIp=0×5555599d4c30)
at model.c:80
#22 0x0000555555568764 in triggerMain (
runPath=0×555555624670"/home/./My_files/Research/WRF_Installation/Dummy_WRF/WRF/test/em_real",params=0×55555563dc60, simulatedTimestep=0x55555c066d00"wrfout_d01_2022-05-05_00:00:08",cur_shmid=2981889)attrigger.c:642
#23 0x0000555555569518 in monitor
--Type <RET> for more, a to quit, c to continue without paging--
runpath=0x555555624070"/home/./My_files/Research/WRF_Installation/Dummy_WRF/WRF/test/em_real",mainArgs=0×5555555f8810, params=0×55555563dc60) at monitor.c:215
#24 0x900055555556abc4 in main (arg=2, argv=0x7fffffffd1c8) at main.c:95
Info locals on Frame 20:
(gdb) info locals
idxR = 39
pyList2D = 0×1550d4f74980
tmpLst = 0×1550d4f15380
The array data is perfectly fine and there's no invalid memory access in "arr". My Py_Initialise() is fine, it shouldn't have run for 4 iterations properly if there was any issue with initialisation I believe. The code converts the 2D C-array to python list and passes the data to a ML model that uses Tensorflow. C-array size is 50 * 50, and ROI_SIZE is a macro set to value 50.
From stack trace why do we see tensor flow tf_session being called?