The documentation for the numpy.frombuffer
function specifically says that the generated array will be one dimensional:
Interpret a buffer as a 1-dimensional array.
I'm not sure about the consequences of this quote. The documentation just tells me that the generated array will be one dimensional, but never says that the input buffer has to describe a one-dimensional object.
I have a (2D) Eigen matrix in C++. I would like to create a Python buffer which describes the content of the matrix. Then, I would like to use this buffer to somehow initialize my NumPy array and make it available to my python scripts. The goal is both to pass information to Python without copying data and to allow python modify the matrix (e.g. to initialize the matrix).
The C-API equivalent of numpy.frombuffer
is PyArray_FromBuffer
, and it also shares the single-dimension phrase, but it has more documentation (emphasis mine):
PyObject* PyArray_FromBuffer(PyObject* buf, PyArray_Descr* dtype, npy_intp count, npy_intp offset)
Construct a one-dimensional ndarray of a single type from an object, buf, that exports the (single-segment) buffer protocol (or has an attribute __buffer__ that returns an object that exports the buffer protocol). A writeable buffer will be tried first followed by a read- only buffer. The NPY_ARRAY_WRITEABLE flag of the returned array will reflect which one was successful. The data is assumed to start at offset bytes from the start of the memory location for the object. The type of the data in the buffer will be interpreted depending on the data- type descriptor, dtype. If count is negative then it will be determined from the size of the buffer and the requested itemsize, otherwise, count represents how many elements should be converted from the buffer.
Does "single-segment" mean that it cannot contain padding used, e.g., to align the rows of the matrix? In that case I'm screwed, because my matrix could very well use an alignment strategy that requires padding.
Back to the original question:
Is there a way for me to create a NumPy array which shares the memory with an pre-existing buffer?
Remark: there is a project on github called Eigen3ToPython, which aims at connecting eigen with python, but it does not allow for memory sharing (emphasis mine):
This library allows to: [...] Convert to/from Numpy arrays (
np.array
) in a transparent manner (however, memory is not shared between both representations)
EDIT Somebody might point out the similarly-titled question Numpy 2D- Array from Buffer?. Unfortunately, the solution given there does not seem to be a valid one for my case, because the generated 2D array does not share the memory with the original buffer.
EDIT: how is data organized in Eigen
Eigen maps 2D matrices in a 1D memory buffer by using strided access. A double precision 3x2 matrix, for instance, needs 6 double, i.e., 48 bytes. A 48-bytes buffer is allocated. The first element in this buffer represents the [0, 0]
entry in the matrix.
In order to access the element [i, j]
, the following formula is used:
double* v = matrix.data() + i*matrix.rowStride() + j*matrix.colStride()
, where matrix
is the matrix object and its member functions data()
, rowStride()
and colStride()
return, respectively, the start address of the buffer, the distance between two consecutive rows and the distance between two consecutive columns (in multiples of the floating point format size).
By default Eigen uses a column-major format, thus rowStride() == 1
, but it can also be configured to use a row-major format, with colStride() == 1
.
Another important configuration option is the alignment. The data buffer could very well include some unneeded values (i.e., values which are not part of the matrix) so to make the columns or rows start at aligned addresses. This makes the operations on the matrix vectorizable. In the example above, assuming column-major format and 16-byte alignment, the following matrix
3 7
1 -2
4 5
could be stored win the following buffer:
0 0 3 1 4 0 7 -2 5 0
The 0 values are called padding. The two 0's at the beginning could be necessary to ensure that the start of the actual data is aligned to the same boundary. (Notice that the data()
member function will return the address of the 3.) In this case the strides for rows and columns are
rowStride: 1
colStride: 4
(while in the unaligned case they would be 1 and 3 respectively.)
Numpy expects a C-contiguous buffer, i.e., a row-major structure with no padding. If no padding is inserted by Eigen, then the problem of the row-major requirement can be worked around for a column-major Eigen matrix pretty easily: one passes the buffer to a numpy array, and the resulting ndarray
is reshaped and transposed. I managed to make this work perfectly.
But in case Eigen does insert padding, the problem can not be solved using this technique because the ndarray
will still see the zeroes in the data and think they are part of the matrix, at the same time discarding some values at the end of the array. And this is the problem I'm asking a solution for.
Now, as a side remark, since we have the luck of having @ggael in the loop, who can probably shed some light, I have to admit that I never had Eigen inserting any padding in my matrices. And I don't seem to find any mention of padding in the Eigen documentation. However, I would expect the alignment strategy to align every column (or row), and not just the first one. Am I wrong with my expectations? If I am, then the whole problem does not apply to Eigen. But it would apply to other libraries I'm using which apply the alignment strategy I described above, so please don't consider this last paragraph when answering the question.