Performant circular buffer for frames (ndarrays) of data samples

Question

I'm receiving blocks of audio samples of varying length from a stream, and each block is a 1D ndarray. A block may be received every 50ms or less. I need to keep a buffer of the last 48000 samples

I've tried defining my buffer like this:

buffer = np.zeros([48000], dtype=np.float32)

Then in my receive block function:

buffer = np.concatenate([buffer,input_block])
buffer = np.delete(buffer,slice(0,np.size(input_block))

However this is too slow. I understand this is causing a resize and copy of array elements and is not optimal.

I've tried a variety of circular buffer implementations such as this and this however they are much slower - I'm not sure why.

Rather than concatenating each new input_block upon receipt I expect it would be more efficient at the point in time where I need to read from my buffer to concatenate a list of past input_blocks. There's a bit of complexity to doing this given the varying size of each block but it should be possible.

Is there another approach I should consider?

The first package 'appends' each value to the buffer in a [for loop](https://github.com/vstadnytskyi/circular_buffer_numpy/blob/master/circular_buffer_numpy/circular_buffer.py#L73) when it should be done with max 2 slices, [for example](https://github.com/eric-wieser/numpy_ringbuffer/blob/master/numpy_ringbuffer/__init__.py#L115). I tested your approach and could append ~2000 slices of size 5000 to 12000 in 50ms (with a ring buffer ~8500 slices in 50ms). Are you sure this is the bottleneck in the code? — Michael Szczesny, Aug 13 '22 at 06:50
This is all that the callback (data ready) function in my code does, so was my prime candidate. It is possible there's a bottleneck upstream of my code in the sounddevices package. — davegravy, Aug 13 '22 at 12:54

hpaulj · Answer 1 · 2022-08-13T15:47:44.473

Your code makes a new array with the buffer joined to the new block. Then it makes another array with a same size block removed from the start. You could reduce the new-array action a bit by first removing the leading block

buffer = np.concatenate((buffer[inputbuffer.shape[0]:], inputbuffer))

It still makes a new array, but does so only once. buffer[inputbuffer.shape[0]:] is a view, a slice of the old.

While we discourage repeated use of np.concatenate because it makes a new array, with all the attendant copying, it is at least compiled. np.delete is a general purpose function written in python. At one level it ends up doing some sort of concatenate, but it has to handle generic inputs first.

Another idea would be to make a new buffer, and copy values to it yourself.

new_buffer = np.zeros_like(buffer)
n = inputbuffer.shape[0]
new_buffer[:-n] = buffer[n:]
new_buffer[-n:] = inputbuffer
buffer = new_buffer

The accepted answer in your second link does something similar

x[:-1] = x[1:]; x[-1] = newvalue

buffer[:-n] = buffer[n:]
buffer[-n:] = inputbuffer

timings

In [95]: %%timeit buffer = np.zeros(10000); new = np.ones(100)
    ...: buffer= np.concatenate((buffer, new))
    ...: buffer= np.delete(buffer, slice(0,100))
26.7 µs ± 31.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [96]: %%timeit buffer = np.zeros(10000); new = np.ones(100)
    ...: buffer= np.concatenate((buffer[100:], new))
10.5 µs ± 14.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [97]: %%timeit buffer = np.zeros(10000); new = np.ones(100)
    ...: buffer[:-100]=buffer[100:]
    ...: buffer[-100:]=new
5.09 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [98]: %%timeit buffer = np.zeros(10000); new = np.ones(100)
    ...: buffer1=np.zeros_like(buffer)
    ...: buffer1[:-100]=buffer[100:]
    ...: buffer1[-100:]=new
25.7 µs ± 215 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Performant circular buffer for frames (ndarrays) of data samples

1 Answers1

timings