0

Consider a system with n_channels transmitting n_samples at a given sampling rate. The 1D buffer containing the timestamps and the 2D buffer containing (n_channels, n_samples) is:

from ctypes import c_double, c_float

# Assume a 2-second window, 3 channels, sampled at 1024 Hz
# data: (n_channels, n_samples) = (3, 2048)
# timestamps: (n_samples,) = (2048,)
n_channels = 3
n_samples = 2048
n_data_values = n_channels * n_samples
data_buffer = (c_float * n_data_values)()
ts_buffer = (c_double * n_samples)() 

I have a C++ binary library that fills the buffer. The function can be summarized as:

from ctypes import byref

fill_buffers(
    byref(data_buffer),
    byref(ts_buffer),
)

At this point, I have 2 filled buffers, one with 2048 elements (timestamps) and one with 3* 2048 elements (data). I want to load as efficiently as possible those 2 buffers in a numpy array.

np.frombuffer seems amazing to read 1D array, e.g. the timestamps, but I can't find a counterpart for N-dim arrays.

# read from buffer for the 1D array
timestamps = np.frombuffer(ts_buffer)  # 192 ns ± 1.11 ns per loop
timestamps = np.array(ts_buffer)  # 854 ns ± 2.99 ns per loop

For now, the data array is loaded with:

data = np.array(data_buffer).reshape(-1, n_channels, order="C").T

Any way to use the same efficient method as np.frombuffer while providing the output shape and the order?


This question is different from How can I initialize a NumPy array from a multidimensional buffer? and from How to restore a 2-dimensional numpy.array from a bytestring? since it does not focus on an alternative to np.frombuffer, but an alternative as efficient.


EDIT: Why is np.frombuffer(data_buffer).reshape(-1, n_channels).T not working? With 3 channels and 1024 points (to speed-up my testing), I get len(data_buffer) = 3072, but:

np.array(data_buffer).reshape(-1, 3).T.size = 3072
np.frombuffer(data_buffer).reshape(-1, 3).T.size = 1536

The application is a LabStreamingLayer buffer. The buffer is filled here https://github.com/labstreaminglayer/liblsl-Python/blob/87276974a311bcf7ceb3383e9d04c6bdcf302771/pylsl/pylsl.py#L854-L861 using the C++ library https://github.com/sccn/liblsl with specifically this function https://github.com/sccn/liblsl/blob/08aa186326e9a339316b7d5677ef31b3651b4aad/src/lsl_inlet_c.cpp#L180-L185

Mathieu
  • 5,410
  • 6
  • 28
  • 55

1 Answers1

1

Does np.frombuffer(data_buffer, dtype=c_float).reshape(-1, n_channels, order="C").T not work correctly? As you are doing it np.array treats the buffer as a 1D array until you reshape it anyways.

For me the following code produces the right shapes. (Hard to verify if it works correctly without a MWE for the data that should be in the buffers).

import numpy as np
from ctypes import c_double, c_float

# Assume a 2-second window, 3 channels, sampled at 1024 Hz
# data: (n_channels, n_samples) = (3, 2048)
# timestamps: (n_samples,) = (2048,)
n_channels = 3
n_samples = 2048
n_data_values = n_channels * n_samples
data_buffer = (c_float * n_data_values)() # Note that c_float is typically 32 bytes while c_double and numpy's default is 64 bytes
ts_buffer = (c_double * n_samples)()

# Create a mock buffer

input_data = np.arange(0,n_data_values, dtype=c_float)
input_data_buffer = input_data.tobytes()


timestamps = np.frombuffer(ts_buffer) 

# Note to specify the data type for the array of floats
data = np.frombuffer(input_data_buffer, dtype=c_float).reshape(-1, n_channels, order="C").T
# data has values 0,1,2 for first time point, 3,4,5 for second, and so on
user9794
  • 186
  • 4
  • I'll try this one (just in case), but when I did `np.frombuffer(data_buffer)`, I got only 2048 elements (a single channel). – Mathieu Dec 06 '22 at 18:20
  • Hard to say without the code creating the buffers, but for the code I posted I get a 3,2048 data array. Perhaps the c code is not creating/filling the buffer correctly – user9794 Dec 06 '22 at 18:24
  • So there is something weird into play, I was also expecting your solution to work. But I'm getting only half of the data, even thou the buffer len is correct. C++ is above my current knowledge, I edited my post with additional information and links, but it seems like some digging into this library is required to figure out what is happening here.. – Mathieu Dec 06 '22 at 18:54
  • Have you tried specifying the dtype for from buffer? Perhaps np.array does it automatically, but I needed to specify dtype as c_float not c_double, e.g. numpy's default data type. c_float is half the size of c_double so you'll get twice as many values. (c_float and c_double may be architecture/compiler dependent, but I'm not very aware of working with c/c++ bindings in Python.) – user9794 Dec 06 '22 at 19:04
  • And I'm an idiot.. obviously, I need to load with the correct `dtype`.. Thanks a lot for the help, works like a charm!! – Mathieu Dec 06 '22 at 19:14