Numpy 2D- Array from Buffer?

Question

I have an memory map, which contains a 2D array and I would like to make a numpy array from it. Ideally, i would like to avoid copying, since the involved array can be big.

My code looks like this:

n_bytes = 10000
tagname = "Some Tag from external System"
map = mmap.mmap(-1, n_bytes, tagname)
offsets = [0, 5000]

columns = []
for offset in offsets:
   #type and count vary in the real code, but for this dummy code I simply made them up. But I know the count and type for every column.
   np_type = np.dtype('f4')
   column_data = np.frombuffer(map, np_type, count=500, offset=offset)
   columns.append(column_data)

# this line seems to copy the data, which I would like to avoid
data = np.array(columns).T

Have you tried reading the whole file as a big 1D array, and then reshape it to a 2D array? — kennytm, Aug 23 '16 at 05:37
@kennytm The data can habe different dtypes per column ( e.g. the first block is a float, the second an int), which I cannot express in the buffer method — Christian Sauer, Aug 23 '16 at 05:51
@ Julien Bernu Jes, I know how many columns, rows and bytes there are- — Christian Sauer, Aug 23 '16 at 05:51

score 6 · Answer 1 · answered Apr 04 '18 at 14:03

Assuming you have a byte array and you know it's dimensions the answer is very simple. imagine you raw RGB data of an image (24 bit per pixel) in a buffer (named 'buff') dimensions are 1024x768

#read the buffer into 1D byte array
arr = numpy.frombuffer(buff, dtype=numpy.uint8)
#now shape the array as you please
arr.shape = (768,1024,3)

score 1 · Answer 2 · answered Aug 23 '16 at 05:48

I haven't used frombuffer much, but I think the np.array works with those arrays as it does with conventionally constructed ones.

Each column_data array will have its own data buffer - the mmap you assigned it. But np.array(columns) reads the values from each array in the list, and constructs a new array from them, with its own data buffer.

I like to use x.__array_interface__ to look at the data buffer location (and to see other key attributes). Compare that dictionary for each element of columns and for data.

You can construct a 2d array from a mmap - using a contiguous block. Just make the 1d frombuffer array, and reshape it. Even transpose will continue to use that buffer (with F order). Slices and views also use it.

But unless you are real careful you'll quickly get copies that put the data elsewhere. Simply data1 = data+1 makes a new array, or advance indexing data[[1,3,5],:]. Same for any concatenation.

2 arrays from bytestring buffers:

In [534]: x=np.frombuffer(b'abcdef',np.uint8)
In [535]: y=np.frombuffer(b'ghijkl',np.uint8)

a new array by joining them

In [536]: z=np.array((x,y))

In [538]: x.__array_interface__
Out[538]: 
{'data': (3013090040, True),
 'descr': [('', '|u1')],
 'shape': (6,),
 'strides': None,
 'typestr': '|u1',
 'version': 3}
In [539]: y.__array_interface__['data']
Out[539]: (3013089608, True)
In [540]: z.__array_interface__['data']
Out[540]: (180817384, False)

the data buffer locations for x,y,z are totally different

But the data for reshaped x doesn't change

In [541]: x.reshape(2,3).__array_interface__['data']
Out[541]: (3013090040, True)

nor does the 2d transpose

In [542]: x.reshape(2,3).T.__array_interface__
Out[542]: 
{'data': (3013090040, True),
 'descr': [('', '|u1')],
 'shape': (3, 2),
 'strides': (1, 3),
 'typestr': '|u1',
 'version': 3}

Same data, different view

In [544]: x
Out[544]: array([ 97,  98,  99, 100, 101, 102], dtype=uint8)
In [545]: x.reshape(2,3).T
Out[545]: 
array([[ 97, 100],
       [ 98, 101],
       [ 99, 102]], dtype=uint8)
In [546]: x.reshape(2,3).T.view('S1')
Out[546]: 
array([[b'a', b'd'],
       [b'b', b'e'],
       [b'c', b'f']], 
      dtype='|S1')

Thank you for the great answer! Do you know how I can use frombuffer method when the column sizes vary? e.g. my first block contains f4, but the second f8 - I would have to do some reshaping after building the 2d array? — Christian Sauer, Aug 23 '16 at 05:54
Structured arrays allow different dtypes in fields. But in such an array an `f4` element will next to a `f8`, etc, in `records`, not as separate columns (blocks of `f4`, separate blocks of `f8`). I don't know of a way of mixing columns and dtypes. — hpaulj, Aug 23 '16 at 06:37

Numpy 2D- Array from Buffer?

2 Answers2

Linked