3

Can anyone explain the following result to me? I know it is not as one would usually do this operation, but I found this result odd.

import numpy as np

a = np.ma.masked_where(np.arange(20)>10,np.arange(20))
b = np.ma.masked_where(np.arange(20)>-1,np.arange(20))
c = np.zeros(a.shape)
d = np.zeros(a.shape)

c[~a.mask] += b[~a.mask]

print(b[~a.mask])
#masked_array(data=[--, --, --, --, --, --, --, --,--, --, --],
#             mask=[ True,  True,  True,  True,  True,  True,  True,  True, True,  True,  True],
#       fill_value=999999,
#            dtype=int64)

print(c)
#[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.  0.  0.  0.  0. 0.  0.  0.  0.  0.]

d[~a.mask] = d[~a.mask] + b[~a.mask]

print(d)
#[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

I expected c to not change, but I guess there is something related to objects in memory going on here. Also, += keeps the original object, while = and + creates a new d.

I just don't really understand where the data comes from that's added to c.

Ehsan
  • 12,072
  • 2
  • 20
  • 33
  • 2
    The data that's added to `c` comes from `b` rather than `a` (as can be seen if you give them different values), but there is something curious that `np.ndarray.__iadd__` will ignore the mask of the other array whereas `np.ndarray.__add__` will take the mask into account. There is a more minimal example than `c[~a.mask] += b[~a.mask]` though. Simply `c += b`. The boolean slicing is not fundamental to this behaviour. – alani Aug 21 '20 at 22:13
  • 2
    If `c` itself was a masked array, you'd get the expected results. But since it's an ordinary array, the `+=` is performed with `b.data`. In general, don't mix masked and unmasked operations and arrays. If you want to preserved the effect of the mask, use masked methods and functions. – hpaulj Aug 22 '20 at 00:01
  • thanks. yes, makes sense that b.data is used. As I mentioned, I do actually use a masked array in my real (more complicated) script as placeholder (c in the question). – pythonewbie Aug 22 '20 at 01:55

1 Answers1

2

I will start with a simpler example for better understanding:

b = np.ma.masked_where(np.arange(20)>-1,np.arange(20))
#b: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#b.data: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
c = np.zeros(b.shape)
#c: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
d = np.zeros(b.shape)
#d: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

c += b
#c: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]

d = d + b
#d: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#d.data: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

The first operation c += b is an in-place operation. In other words, it is equivalent to c = type(c).__iadd__(c, b) which does the addition according to type of c, which is not a masked array, hence the data of b used as unmasked.

On the other hand, d = d + b is equivalent to d = np.MaskedArray.__add__(d, b) (to be more particular, since masked arrays are a subclass of ndarrays, it uses __radd__) and is NOT an in-place assignment. This means it creates a new object and uses the wider type on the right hand side of the equation when adding and hence converts d (which is an unmasked array) to a masked array (because b is a masked array), therefore the addition uses valid values only (which in this case there is none since ALL elements of b are masked and invalid). This results in a masked array d with same mask as b while the data of d remains unchanged.

This difference in behavior is not Numpy specific and applies to python itself too. The case mentioned in the question by OP has similar behavior, and as @alaniwi mentioned in the comments, the boolean indexing with mask a is not fundamental to the behavior. Using a to mask elements of b, c, and d is only limiting the assignment to masked elements by a (rather than all elements of arrays) and nothing more.

To makes things a bit more interesting and in fact clearer, lets switch the places of b and d on the right hand side:

e = np.zeros(b.shape)
#e: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

e = b + e
#e: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#e.data: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]

Note that, similar to d = d + b, the right hand side uses masked array __add__ function, so the output is a masked array, but since you are adding e to b (a.k.a e = np.MaskedArray.__add__(b, e)), the masked data of b is returned, while in d = d + b, you are adding b to d and data of d is returned.

Ehsan
  • 12,072
  • 2
  • 20
  • 33
  • 1
    1 thing that is still not totally clear: Do you know why d is still a numpy.ndarray and not a numpy.ma.core.MaskedArray if you use my original code? (Python 3.7.3, numpy 1.16.2) – pythonewbie Aug 22 '20 at 02:04
  • 1
    @pythonewbie Good point. When you are assigning the elements (and not the whole object itself), you are technically setting items of the array to new objects (rather than the whole object itself). For that reason, when you index using masks, the array type does not change, but still the `MaskedArray.__add__` is used. In other words, `d[i]=d[i]+b[i]` is equal to `d.__setitem__(np.MaskedArray.__radd__(d.__getitem__(i), b.__getitem__(i)))` and hence, the invalid values are not added to values of `d`. – Ehsan Aug 22 '20 at 02:32
  • 1
    Thanks again! My question got completely answered within a couple of hours. Awesome stuff. – pythonewbie Aug 22 '20 at 08:25