1

I have a program whose current performance bottleneck involves creating a new array from a relatively short list of relatively long, flat arrays:

num_arrays = 5
array_length = 1000

arrays = [np.random.random((array_length, )) for _ in range(num_arrays)]

new_array = np.array(arrays)

In other words, stacking n arrays of shape (s,) into new_array of shape (n, s).

I am looking for the most efficient way to compute this, since this operation is repeated millions of times.

I tested for performance of the two trivial ways to do this:

%timeit np.array(arrays)
>>> 3.6 µs ± 67.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.stack(arrays)
>>> 9.61 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

I am currently using np.array(arrays), but I am wondering if there is a more efficient way to do this.

Some details which might help:

  • The length of the arrays is fixed throughout the runtime of the program, e.g. 1000.
  • The number of arrays is usually low, usually <=5. It is possible to get the upper bound for this at checkpoints throughout the run of the program (i.e. every ~1000 creation of such arrays), but not in advance.
Nur L
  • 791
  • 7
  • 15
  • 1
    Why should there be anything better? Any way you are creating a (5,1000) array. You could try initializing that array first, and then assigning values. `np.stack` uses `np.concatenate`, just adding a dimension to the arrays first. `np.vstack` does the same. – hpaulj Apr 30 '21 at 16:57
  • 1
    https://stackoverflow.com/questions/67230123 already talk about possible solutions. The problem is you use the CPython *interpreter* to solve a low-latency problem and CPython is clearly not designed for that. Everything in the loop executed millions of times will likely be an issue. You could save maybe few microseconds for some instructions but optimizing such a loop will quickly be a nightmare with CPython. So, please use the right tool for this. – Jérôme Richard Apr 30 '21 at 20:59
  • Thank you @JérômeRichard . What would you suggest instead? Using pypy? Porting to c++? Most of our code is native Python with very little external librairies, mostly numpy. However, the program uses multiple processes, through the stdlib `multiprocessing`, which I'm not sure will work with pypy. – Nur L May 01 '21 at 08:58
  • 1
    Well, as I stated in the post, you can use Numba or Cython, but rewriting your code in C++ is probably better (possibly with libraries like Eigen, although it takes some time to learn). If you do not want to rewrite the hot parts of your code in C++, Numba already do a quite good job to reach high performance. But is may not be enough here. There is no free lunch. – Jérôme Richard May 01 '21 at 09:08

0 Answers0