Fastest way to left-cycle a numpy array (like pop, push for a queue)

Question

With numpy arrays, I want to perform this operation:

move x[1],...,x[n-1] to x[0],...,x[n-2] (left shift),
write a new value in the last index: x[n-1] = newvalue.

This is similar to a pop(), push(newvalue) for a first-in last-out queue (only inverted).

A naive implementation is: x[:-1] = x[1:]; x[-1] = newvalue.

Another implementation, using np.concatenate, is slower: np.concatenate((x[1:], np.array(newvalue).reshape(1,)), axis=0).

Is there a fastest way to do it?

Note: this is *not* the same question as in https://stackoverflow.com/questions/30262736/fastest-way-to-shift-a-numpy-array — Næreen, Mar 13 '17 at 18:42
The 'naive' version looks good to me. Why should there be something faster? You have to copy values, either to a new array or itself. When I test your code on `x=np.arange(100000)` I get times like `21.5 µs per loop`. That looks fast to me. — hpaulj, Mar 13 '17 at 18:54
There's no way to do this without copying the contents of the array, so I don't think you can do better than the "naive" approach. If this is a bottleneck then you might want to consider using a different datastructure, e.g. a [`deque`](https://docs.python.org/2/library/collections.html#collections.deque), where the push-and-pop operation does not require a copy and can be done in constant time. — ali_m, Mar 13 '17 at 20:21
OK, thanks for your replies, that was my intuition also. I am trying with a deque instead. — Næreen, Mar 15 '17 at 18:21
Oh, in fact in the algorithm, not only `X[0]` and `X[1]` are needed, but also a value in the middle of the array, so a `deque` is useless. Sorry! Thanks anyway for your replies! — Næreen, Mar 16 '17 at 08:12

score 20 · Accepted Answer · answered Mar 16 '17 at 08:19

After some experiments, it is clear that:

copying is required,
and the fastest and simplest way to do that, for nparray (numpy arrays) is a slicing and copying.

So the solution is: x[:-1] = x[1:]; x[-1] = newvalue.

Here is a small benchmark:

>>> x = np.random.randint(0, 1e6, 10**8); newvalue = -100
>>> %timeit x[:-1] = x[1:]; x[-1] = newvalue
1000 loops, best of 3: 73.6 ms per loop
>>> %timeit np.concatenate((x[1:], np.array(newvalue).reshape(1,)), axis=0) 
1 loop, best of 3: 339 ms per loop

But if you don't need to have a fast access to all values in the array, but only the first or last ones, using a deque is smarter.

How would the example look when you use deque? – Stefan Jan 09 '21 at 11:42 — Stefan, Jan 09 '21 at 11:42

L Co · Answer 2 · 2021-02-28T23:36:11.683

I know I'm late and this question has been satisfactorily answered, but I was just facing something similar for recording a buffer of streaming data.

You mentioned "first-in last-out" which is a stack, but your example demonstrates a queue, so I will share a solution for a queue that does not require copying to enqueue new items. (You will eventually need to do one copy using numpy.roll to pass the final array to another function.)

You can use a circular array with a pointer that tracks where the tail is (the place you will be adding new items to the queue).

If you start with this array:

x[0], x[1], x[2], x[3], x[4], x[5]
                               /\
                              tail

and you want to drop x[0] and add x[6] you can do this using the originally allocated memory for the array without the need for copy

x[6], x[1], x[2], x[3], x[4], x[5]
 /\
tail

and so on...

x[6], x[7], x[2], x[3], x[4], x[5]
       /\
      tail

Each time you enqueue you move the tail one spot to the right. You can use modulus to make this wrap nicely: new_tail = (old_tail + 1) % length.

Finding the head of the queue is always one spot after the tail. This can be found using the same formula: head = (tail + 1) % length.

            head
             \/
x[6], x[7], x[2], x[3], x[4], x[5]
       /\
      tail

Here is an example of the class I created for this circular buffer/array:

# benchmark_circular_buffer.py
import numpy as np

# all operations are O(1) and don't require copying the array
# except to_array which has to copy the array and is O(n)
class RecordingQueue1D:
    def __init__(self, object: object, maxlen: int):
        #allocate the memory we need ahead of time
        self.max_length: int = maxlen
        self.queue_tail: int = maxlen - 1
        o_len = len(object)
        if (o_len == maxlen):
            self.rec_queue = np.array(object, dtype=np.int64)
        elif (o_len > maxlen):
            self.rec_queue = np.array(object[o_len-maxlen:], dtype=np.int64)
        else:
            self.rec_queue = np.append(np.array(object, dtype=np.int64), np.zeros(maxlen-o_len, dtype=np.int64))
            self.queue_tail = o_len - 1

    def to_array(self) -> np.array:
        head = (self.queue_tail + 1) % self.max_length
        return np.roll(self.rec_queue, -head) # this will force a copy

    def enqueue(self, new_data: np.array) -> None:
        # move tail pointer forward then insert at the tail of the queue
        # to enforce max length of recording
        self.queue_tail = (self.queue_tail + 1) % self.max_length        
        self.rec_queue[self.queue_tail] = new_data

    def peek(self) -> int:
        queue_head = (self.queue_tail + 1) % self.max_length
        return self.rec_queue[queue_head]

    def replace_item_at(self, index: int, new_value: int):
        loc = (self.queue_tail + 1 + index) % self.max_length
        self.rec_queue[loc] = new_val

    def item_at(self, index: int) -> int:
        # the item we want will be at head + index
        loc = (self.queue_tail + 1 + index) % self.max_length
        return self.rec_queue[loc]

    def __repr__(self):
        return "tail: " + str(self.queue_tail) + "\narray: " + str(self.rec_queue)

    def __str__(self):
        return "tail: " + str(self.queue_tail) + "\narray: " + str(self.rec_queue)
        # return str(self.to_array())


rnd_arr = np.random.randint(0, 1e6, 10**8)
new_val = -100

slice_arr = rnd_arr.copy()
c_buf_arr = RecordingQueue1D(rnd_arr.copy(), len(rnd_arr))

# Test speed for queuing new a new item
# swapping items 100 and 1000
# swapping items 10000 and 100000
def slice_and_copy():
    slice_arr[:-1] = slice_arr[1:]
    slice_arr[-1] = new_val
    old = slice_arr[100]
    slice_arr[100] = slice_arr[1000]
    old = slice_arr[10000]
    slice_arr[10000] = slice_arr[100000]

def circular_buffer():
    c_buf_arr.enqueue(new_val)
    old = c_buf_arr.item_at(100)
    slice_arr[100] = slice_arr[1000]
    old = slice_arr[10000]
    slice_arr[10000] = slice_arr[100000]

# lets add copying the array to a new numpy.array
# this will take O(N) time for the circular buffer because we use numpy.roll()
# which copies the array.
def slice_and_copy_assignemnt():
    slice_and_copy()
    my_throwaway_arr = slice_arr.copy()
    return my_throwaway_arr

def circular_buffer_assignment():
    circular_buffer()
    my_throwaway_arr = c_buf_arr.to_array().copy()
    return my_throwaway_arr


# test using
# python -m timeit -s "import benchmark_circular_buffer as bcb" "bcb.slice_and_copy()"
# python -m timeit -s "import benchmark_circular_buffer as bcb" "bcb.circular_buffer()" 
# python -m timeit -r 5 -n 4 -s "import benchmark_circular_buffer as bcb" "bcb.slice_and_copy_assignemnt()"
# python -m timeit -r 5 -n 4 -s "import benchmark_circular_buffer as bcb" "bcb.circular_buffer_assignment()"

When you have to enqueue a lot of items without needing hand off a copy of the array, this a couple magnitudes faster than slicing.

Accessing items and replacing items is O(1). Enqueue and peek are both O(1). Copying the array takes O(n) time.

Benchmarking Results:

(thermal_venv) PS X:\win10\repos\thermal> python -m timeit -s "import benchmark_circular_buffer as bcb" "bcb.slice_and_copy()"
10 loops, best of 5: 36.7 msec per loop

(thermal_venv) PS X:\win10\repos\thermal> python -m timeit -s "import benchmark_circular_buffer as bcb" "bcb.circular_buffer()" 
200000 loops, best of 5: 1.04 usec per loop

(thermal_venv) PS X:\win10\repos\thermal> python -m timeit -s "import benchmark_circular_buffer as bcb" "bcb.slice_and_copy_assignemnt()"
2 loops, best of 5: 166 msec per loop

(thermal_venv) PS X:\win10\repos\thermal> python -m timeit -r 5 -n 4 -s "import benchmark_circular_buffer as bcb" "bcb.slice_and_copy_assignemnt()"
4 loops, best of 5: 159 msec per loop

(thermal_venv) PS X:\win10\repos\thermal> python -m timeit -r 5 -n 4 -s "import benchmark_circular_buffer as bcb" "bcb.circular_buffer_assignment()" 
4 loops, best of 5: 511 msec per loop

There is a test script and an implementation that handles 2D arrays on my GitHub here

Could you add a benchmark like the one I did abouve (https://stackoverflow.com/a/42828629/5889533), to compare? — Næreen, Feb 28 '21 at 21:43
@Næreen that is a good idea, I'll try that out and add the results. — L Co, Feb 28 '21 at 21:57
@Næreen I made some significant speed improvements to the constructor and added a method to replace items at any spot in the queue. It was silly not to do benchmarking and go right to staging this for production. I knew it would be faster, but that was still lazy of me, so I'm glad you gave me the push. As I expected, for enqueuing new items the circular buffer is several magnitudes faster than the slice method 1 usec vs 36 msec. There is significantly more overhead if you need to copy the array though, so you would want to choose the appropriate solution for a given scenario. — L Co, Feb 28 '21 at 23:44
Great job @L-co. Your module is interesting! It requires more code than my naive slice-and-write method I described above (a few yeaors ago) but it's smart and efficient! — Næreen, Mar 03 '21 at 23:05

Fastest way to left-cycle a numpy array (like pop, push for a queue)

2 Answers2

Linked

Related