7

I need some help regarding passing C array to python(numpy). I have 2d array of doubles NumRows x NumInputs, it seems that PyArray_SimpleNewFromData does not convert it right way - it is hard to see because debugger does not show much, only pointers.

What would be the right way to pass 2 dimensional array ?

int NumRows = X_test.size();
int NumInputs = X_test_row.size();

double **X_test2 = new double*[NumRows];
for(int i = 0; i < NumRows; ++i) 
{
    X_test2[i] = new double[NumInputs];
}


for(int r = 0; r < NumRows; ++r) 
{
    for(int c = 0; c < NumInputs; ++c) 
    {
        X_test2[r][c] = X_test[r][c];
    }
}




const char *ScriptFName = "100-ABN-PREDICT";
char *FunctionName=NULL;

FunctionName="PredictGBC_DBG"; 

npy_intp Dims[2];
Dims[0]= NumRows;
Dims[1] = NumInputs;

PyObject *ArgsArray;
PyObject *pName, *pModule, *pDict, *pFunc, *pValue, *pArgs;

int row, col, rows, cols, size, type;

const double* outArray;
double ArrayItem;

//===================

Py_Initialize();

pName = PyBytes_FromString(ScriptFName);

pModule = PyImport_ImportModule(ScriptFName);

if (pModule != NULL)
{
    import_array(); // Required for the C-API

    ArgsArray = PyArray_SimpleNewFromData (2, Dims, NPY_DOUBLE, X_test2);//SOMETHING WRONG 

    pDict = PyModule_GetDict(pModule);

    pArgs = PyTuple_New (1);
    PyTuple_SetItem (pArgs, 0, ArgsArray);

    pFunc = PyDict_GetItemString(pDict, FunctionName);

    if (pFunc && PyCallable_Check(pFunc))
    {

        pValue = PyObject_CallObject(pFunc, pArgs);//CRASHING HERE

        if (pValue != NULL) 
        {
            rows = PyArray_DIM(pValue, 0);
            cols = PyArray_DIM(pValue, 1);
            size = PyArray_SIZE(pValue);
            type = PyArray_TYPE(pValue);


            // get direct access to the array data
            //PyObject* m_obj;
            outArray = static_cast<const double*>(PyArray_DATA(pValue));


            for (row=0; row < rows; row++) 
            {
                ArrayItem = outArray[row];
                y_pred.push_back(ArrayItem);
            }

        }
        else 
        {
            y_pred.push_back(EMPTY_VAL);
        }
    }
    else 
    {
        PyErr_Print();
    }//pFunc && PyCallable_Check(pFunc)



}//(pModule!=NULL
else
{
    PyErr_SetString(PyExc_TypeError, "Cannot call function ?!");
    PyErr_Print();
}




Py_DECREF(pValue);
Py_DECREF(pFunc);

Py_DECREF(ArgsArray);  
Py_DECREF(pModule);
Py_DECREF(pName);


Py_Finalize (); 
klubow
  • 431
  • 3
  • 11
  • 1
    Firstly, I see `new`, so I guess the better tag is `C++`, even if it's largely C-like what you're doing. Secondly, I would argue `X_test2` is not a 2 dimensional array, but rather an array of arrays. It just happens that each subarray is the same size (`NumInputs`), but it doesn't have to be. –  Jan 14 '15 at 10:45
  • 2
    If you don't mind using `Cython`, which is very much an accepted standard for interfacing numpy and C, you can make it a lot easier. Though in that case, it is probably easier (recommended?) to allocate the array in Python/numpy, and then pass that to your C routine to do the computations (so your second for-loop, I guess). There are some [examples](https://github.com/cython/cython/wiki/tutorials-NumpyPointerToC) at the Cython wiki to help you out. Note how that numpy array is 2D, but then passed a single pointer and used as a 1D array inside the C code. Hence (partly) my previous comment. –  Jan 14 '15 at 10:47
  • It is a bit more complicated: c++ part is dll used by some other software, it should only get data, change its format to numpy and pass it to python where all the calculation are done (scikit-learn). – klubow Jan 14 '15 at 10:52
  • If it's a dll, can't you use [ctypes](http://stackoverflow.com/questions/252417/how-can-i-use-a-dll-from-python)? –  Jan 14 '15 at 11:22
  • 1
    Lets focus on passing 2d array to python. – klubow Jan 14 '15 at 11:43

1 Answers1

7

You'll have to copy your data to a contiguous block of memory. To represent a 2d array, numpy does not use an array of pointers to 1d arrays. Numpy expects the array to be stored in a contiguous block of memory, in (by default) row major order.

If you create your array using PyArray_SimpleNew(...), numpy allocates the memory for you. You have to copy X_test2 to this array, using, say, std::memcpy or std::copy in a loop over the rows.

That is, change this:

ArgsArray = PyArray_SimpleNewFromData (2, Dims, NPY_DOUBLE, X_test2);//SOMETHING WRONG 

to something like this:

// PyArray_SimpleNew allocates the memory needed for the array.
ArgsArray = PyArray_SimpleNew(2, Dims, NPY_DOUBLE);

// The pointer to the array data is accessed using PyArray_DATA()
double *p = (double *) PyArray_DATA(ArgsArray);

// Copy the data from the "array of arrays" to the contiguous numpy array.
for (int k = 0; k < NumRows; ++k) {
    memcpy(p, X_test2[k], sizeof(double) * NumInputs);
    p += NumInputs;
}

(It looks like X_test2 is a copy of X_test, so you might want to modify the above code to copy directly from X_test to the numpy array.)

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • Thanks, I just quickly checked - it seems to work (I investigate further later). By the way, maybe you will know why calling: cols = PyArray_DIM(pValue, 1); does not return number of column ie. array.shape[1] ? It returns 8 when numpy array is of doubles and 4 when int32 ? – klubow Jan 14 '15 at 16:28
  • What is `pValue`? The first argument of `PyArray_DIM()` must be the python object holding the numpy array, e.g. `ArgsArray`. – Warren Weckesser Jan 14 '15 at 17:11
  • This in the code attached in the question, pValue = PyObject_CallObject(pFunc, pArgs). This is numpy array returned from python – klubow Jan 14 '15 at 18:07
  • I just figured it out (in a way) - this happens when numpy array is 1d – klubow Jan 14 '15 at 18:41
  • 1
    Ah, that makes sense. `PyArray_DIM(arr, k)` is accessing an array with length `PyArray_NDIM(arr)`, which is 1 for a 1d array. – Warren Weckesser Jan 14 '15 at 19:19