-1

I was expecting to just say something like

ma.zeros(my_shape, mask=my_mask, hard_mask=True)

(where the mask is the correct shape) but ma.zeros (or ma.ones or ma.empty) rather surprisingly doesn't recognise the mask argument. The simplest I've come up with is

ma.array(np.zeros(my_shape), mask=my_mask, hard_mask=True)

which seems to involve unnecessary copying of lots of zeros. Is there a better way?

1 Answers1

0

Make a masked array:

In [162]: x = np.arange(5); mask=np.array([1,0,0,1,0],bool)    
In [163]: M = np.ma.MaskedArray(x,mask)

In [164]: M
Out[164]: 
masked_array(data=[--, 1, 2, --, 4],
             mask=[ True, False, False,  True, False],
       fill_value=999999)

Modify x, and see the result in M:

In [165]: x[-1] = 10

In [166]: M
Out[166]: 
masked_array(data=[--, 1, 2, --, 10],
             mask=[ True, False, False,  True, False],
       fill_value=999999)

In [167]: M.data
Out[167]: array([ 0,  1,  2,  3, 10])

In [169]: M.data.base
Out[169]: array([ 0,  1,  2,  3, 10])

The M.data is a view of the array used in creating it. No unnecessary copies.

I haven't used functions like np.ma.zeros, but

In [177]: np.ma.zeros
Out[177]: <numpy.ma.core._convert2ma at 0x1d84a052af0>

_convert2ma is a Python class, that takes a funcname and returns new callable. It does not add mask-specific parameters. Study that yourself if necessary.

np.ma.MaskedArray, the function that actually subclasses ndarray takes a copy parameter

copy : bool, optional
        Whether to copy the input data (True), or to use a reference instead.
        Default is False.

and the first line of its __new__ is

    _data = np.array(data, dtype=dtype, copy=copy,
                     order=order, subok=True, ndmin=ndmin)

I haven't quite sorted out whether M._data is just a reference to the source data, or a view. In either case, it isn't a copy, unless you say so.

I haven't worked a lot with masked arrays, but my impression is that, while they can be convenient, they shouldn't be used where you are concerned about performance. There's a lot of extra work required to maintain both the mask and the data. The extra time involved in copying the data array, if any, will be minor.

hpaulj
  • 221,503
  • 14
  • 230
  • 353