2

I am a bit surprised by the fact that np.ma.masked_equal or masked_values does not create a mask of False if the value is not in the array, but instead a scalar.

Example :

y = np.arange(10)
yy = np.ma.masked_equal(y,0)

yields a masked array withe the mask being an array of 10 False values, while

y = np.arange(1,10) 
yy = np.ma.masked_equal(y,0)

yields a masked array with the mask set to the scalar False. As a result, given that in my code I do not know beforehand whether the mask match any entry in the array, I am forced to check explicitly:

yy=np.ma.masked_values(y,0)
if np.isscalar(yy.mask):
    yy.mask=np.zeros(y.shape,dtype=bool)

This seems an overwork to me. What is the reason for this behavior, and is there a way to avoid it?

MSeifert
  • 145,886
  • 38
  • 333
  • 352

1 Answers1

2

You can simply create the MaskedArray youself:

yy = np.ma.MaskedArray(y, mask=(y==0))

It seems that NumPy tries to minimize the memory requirements and speed up the computations in case the MaskedArray is unmasked.

numpy.ma.nomask

Value indicating that a masked array has no invalid entry. nomask is used internally to speed up computations when the mask is not needed.

If you check:

>>> np.ma.nomask
False

So the single False represents "no mask". So you could also check maskedarr.mask is np.ma.nomask (it's a garantueed constant):

yy = some_operation_that_creates_a_masked_array
if yy.mask is np.ma.nomask:
    yy.mask = np.zeros(yy.shape, dtype=bool)

That carries a bit more context then np.isscalar.

Community
  • 1
  • 1
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 1
    thanks a lot. I guess it makes sense, and I was probably too keen on using masked_equal or masked_values, but I am not sure I understand why it is in the context of their use that numpy tries to minimize memory requirements, and not when creating the masked_array directly. – Johann cohen-tanugi May 30 '17 at 08:01