numpy: boolean indexing and memory usage

Question

Consider the following numpy code:

A[start:end] = B[mask]

Here:

A and B are 2D arrays with the same number of columns;
start and end are scalars;
mask is a 1D boolean array;
(end - start) == sum(mask).

In principle, the above operation can be carried out using O(1) temporary storage, by copying elements of B directly into A.

Is this what actually happens in practice, or does numpy construct a temporary array for B[mask]? If the latter, is there a way to avoid this by rewriting the statement?

Sven Marnach · Answer 1 · 2011-05-11T11:37:33.513

3

The line

A[start:end] = B[mask]

will -- according to the Python language definition -- first evaluate the right hand side, yielding a new array containing the selected rows of B and occupying additional memory. The most efficient pure-Python way I'm aware of to avoid this is to use an explicit loop:

from itertools import izip, compress
for i, b in izip(range(start, end), compress(B, mask)):
    A[i] = b

Of course this will be much less time-efficient than your original code, but it only uses O(1) additional memory. Also note that itertools.compress() is available in Python 2.7 or 3.1 or above.

edited May 11 '11 at 11:37

answered May 11 '11 at 09:53

Sven Marnach

574,206
118
941
841

1

Surely, "yielding a new array containing the selected rows of B and occupying additional memory" is a non sequitur? It's up to `B.__getitem__()` to choose what it wants to return. For example, if `mask` were a `slice`, a proxy (view) would be returned, and no copy would take place. – NPE May 11 '11 at 11:52
@aix: According to the OP, `mask` is a one-dimensional Boolean array. Did I miss anything? – Sven Marnach May 11 '11 at 12:12
@aix: Oh, I see. The part with the language deifnition is a bit ambiguous. It was only meant to refer to the part "first evaluate the right hand side". – Sven Marnach May 11 '11 at 12:14
Yes, I think we understand each other. – NPE May 11 '11 at 12:18

score 2 · Accepted Answer · answered May 11 '11 at 09:52

2

Using boolean arrays as a index is fancy indexing, so numpy needs to make a copy. You could write a cython extension to deal with it, if you getting memory problems.

answered May 11 '11 at 09:52

tillsten

14,491
5
32
41

+1 for bringing in Cython. It is this kinds of loops that it excels at. – Björn Pollex May 11 '11 at 09:58

numpy: boolean indexing and memory usage

2 Answers2

Linked