7

Consider the following numpy code:

A[start:end] = B[mask]

Here:

  • A and B are 2D arrays with the same number of columns;
  • start and end are scalars;
  • mask is a 1D boolean array;
  • (end - start) == sum(mask).

In principle, the above operation can be carried out using O(1) temporary storage, by copying elements of B directly into A.

Is this what actually happens in practice, or does numpy construct a temporary array for B[mask]? If the latter, is there a way to avoid this by rewriting the statement?

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012

2 Answers2

3

The line

A[start:end] = B[mask]

will -- according to the Python language definition -- first evaluate the right hand side, yielding a new array containing the selected rows of B and occupying additional memory. The most efficient pure-Python way I'm aware of to avoid this is to use an explicit loop:

from itertools import izip, compress
for i, b in izip(range(start, end), compress(B, mask)):
    A[i] = b

Of course this will be much less time-efficient than your original code, but it only uses O(1) additional memory. Also note that itertools.compress() is available in Python 2.7 or 3.1 or above.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 1
    Surely, "yielding a new array containing the selected rows of B and occupying additional memory" is a non sequitur? It's up to `B.__getitem__()` to choose what it wants to return. For example, if `mask` were a `slice`, a proxy (view) would be returned, and no copy would take place. – NPE May 11 '11 at 11:52
  • @aix: According to the OP, `mask` is a one-dimensional Boolean array. Did I miss anything? – Sven Marnach May 11 '11 at 12:12
  • @aix: Oh, I see. The part with the language deifnition is a bit ambiguous. It was only meant to refer to the part "first evaluate the right hand side". – Sven Marnach May 11 '11 at 12:14
  • Yes, I think we understand each other. – NPE May 11 '11 at 12:18
2

Using boolean arrays as a index is fancy indexing, so numpy needs to make a copy. You could write a cython extension to deal with it, if you getting memory problems.

tillsten
  • 14,491
  • 5
  • 32
  • 41