I have a program whose current performance bottleneck involves creating a new array from a relatively short list of relatively long, flat arrays:
num_arrays = 5
array_length = 1000
arrays = [np.random.random((array_length, )) for _ in range(num_arrays)]
new_array = np.array(arrays)
In other words, stacking n
arrays of shape (s,)
into new_array
of shape (n, s)
.
I am looking for the most efficient way to compute this, since this operation is repeated millions of times.
I tested for performance of the two trivial ways to do this:
%timeit np.array(arrays)
>>> 3.6 µs ± 67.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.stack(arrays)
>>> 9.61 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I am currently using np.array(arrays)
, but I am wondering if there is a more efficient way to do this.
Some details which might help:
- The length of the arrays is fixed throughout the runtime of the program, e.g.
1000
. - The number of arrays is usually low, usually
<=5
. It is possible to get the upper bound for this at checkpoints throughout the run of the program (i.e. every ~1000 creation of such arrays), but not in advance.