I have a large 2D numpy array. I would like to be able to efficiently run row-wise operations on subsets of the columns, without copying the data.
In what follows,
a = np.arange(1000000).reshape(1000, 10000)
and columns = np.arange(1, 1000, 2)
. For reference,
In [4]: %timeit a.sum(axis=1)
7.26 ms ± 431 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The approaches I am aware of are:
- fancy indexing with list of columns
In [5]: %timeit a[:, columns].sum(axis=1)
42.5 ms ± 197 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
- fancy indexing with mask of columns
In [6]: cols_mask = np.zeros(10000, dtype=bool)
...: cols_mask[columns] = True
In [7]: %timeit a[:, cols_mask].sum(axis=1)
42.1 ms ± 302 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
- masked array
In [8]: cells_mask = np.ones((1000, 10000), dtype=bool)
In [9]: cells_mask[:, columns] = False
In [10]: am = np.ma.masked_array(a, mask=cells_mask)
In [11]: %timeit am.sum(axis=1)
80 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
- python loop
In [12]: %timeit sum([a[:, i] for i in columns])
31.2 ms ± 531 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Somewhat surprisingly to me, the last approach is the most efficient: moreover, it avoids copying the full data, which for me is a prerequisite. However, it is still much slower than the simple sum (on double the data size), and most importantly, it is not trivial to generalize to other operations (e.g., cumsum
).
Is there any approach I am missing? I would be fine with writing some cython code, but I would like the approach to work for any numpy function, not just sum
.