4

I would expect the result of a summation for a fully masked array to be zero, but instead "masked" is returned. How can I get the function to return zero?

>>> a = np.asarray([1, 2, 3, 4])
>>> b = np.ma.masked_array(a, mask=~(a > 2))
>>> b
masked_array(data = [-- -- 3 4],
             mask = [ True  True False False],
       fill_value = 999999)

>>> b.sum()
7
>>> b = np.ma.masked_array(a, mask=~(a > 5))
>>> b
masked_array(data = [-- -- -- --],
         mask = [ True  True  True  True],
   fill_value = 999999)


>>> b.sum()
masked
>>> np.ma.sum(b)
masked
>>> 

Here's another unexpected thing:

>>> b.sum() + 3
masked
orange
  • 7,755
  • 14
  • 75
  • 139

1 Answers1

4

In your last case:

In [197]: bs=b1.sum()
In [198]: bs.data
Out[198]: array(0.0)
In [199]: bs.mask
Out[199]: array(True, dtype=bool)
In [200]: repr(bs)
Out[200]: 'masked'
In [201]: str(bs)
Out[201]: '--'

If I specify keepdims, I get a different array:

In [208]: bs=b1.sum(keepdims=True)
In [209]: bs
Out[209]: 
masked_array(data = [--],
             mask = [ True],
       fill_value = 999999)
In [210]: bs.data
Out[210]: array([0])
In [211]: bs.mask
Out[211]: array([ True], dtype=bool)

here's the relevant part of the sum code:

def sum(self, axis=None, dtype=None, out=None, keepdims=np._NoValue):
    kwargs = {} if keepdims is np._NoValue else {'keepdims': keepdims}

    _mask = self._mask
    newmask = _check_mask_axis(_mask, axis, **kwargs)
    # No explicit output
    if out is None:
        result = self.filled(0).sum(axis, dtype=dtype, **kwargs)
        rndim = getattr(result, 'ndim', 0)
        if rndim:
            result = result.view(type(self))
            result.__setmask__(newmask)
        elif newmask:
            result = masked
        return result
    ....

It's the

 newmask = np.ma.core._check_mask_axis(b1.mask, axis=None)
 ...
 elif newmask: result = masked

lines that produce the masked value in your case. newmask is True in the case where all values are masked, and False is some are not. The choice to return np.ma.masked is deliberate.

The core of the calculation is:

In [218]: b1.filled(0).sum()
Out[218]: 0

the rest of the code decides whether to return a scalar or masked array.

============

And for your addition:

In [232]: np.ma.masked+3
Out[232]: masked

It looks like the np.ma.masked is a special array that propagates itself across calculations. Sort of like np.nan.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I'd expect a scalar value to be returned, not an array (it's the sum over all dimensions that I am looking for). – orange Dec 14 '16 at 00:37
  • 1
    I added the relevant part of the `sum` code. I can understand why there might be some ambiguity in the returned value when the input is fully masked. I'm not privy to any debates as to whether this is right or not. If you don't like the result you could apply sum to `b.filled(0)`. – hpaulj Dec 14 '16 at 00:46
  • Looks like the reasoning is, 'if all values are masked, it can't give a proper scalar value (even if the starting value of the sum is 0)`. – hpaulj Dec 14 '16 at 01:09
  • Interesting. Thanks for digging this up. You comparison with `np.nan` is spot on. I guess I can always do `b.filled(0).sum()` as you wrote. – orange Dec 14 '16 at 01:13
  • @hpaulj yes, because I think the reasoning behind and one of the main point of the masked arrays that appears from the documentation and a recurring word in there is "invalid" and a scalar, for example 0, might be a valid returned value in general. So probably they try to be cautious. – fedepad Dec 14 '16 at 01:14
  • @fedepad @hpaulj: I get that they treat it similar to `np.nan`, but then there should be a special `sum()` function like `nansum()`. – orange Dec 14 '16 at 01:16