in numpy, what is the difference between calling MA.masked_where and MA.masked_array?

Question

Calling masked_array (the class constructor) and the masked_where function both seem to do exactly the same thing, in terms of being able to construct a numpy masked array given the data and mask values. When would you use one or the other?

>>> import numpy as np
>>> import numpy.ma as MA

>>> vals = np.array([0,1,2,3,4,5])
>>> cond = vals > 3

>>> vals
array([0, 1, 2, 3, 4, 5])

>>> cond
array([False, False, False, False,  True,  True], dtype=bool)

>>> MA.masked_array(data=vals, mask=cond)
masked_array(data = [0 1 2 3 -- --],
             mask = [False False False False  True  True],
       fill_value = 999999)

>>> MA.masked_where(cond, vals)
masked_array(data = [0 1 2 3 -- --],
             mask = [False False False False  True  True],
       fill_value = 999999)

The optional argument copy to masked_where (its only documented optional argument) is also supported by masked_array, so I don't see any options that are unique to masked_where. Although the converse is not true (e.g. masked_where doesn't support dtype), I don't understand the purpose of masked_where as a separate function.

score 3 · Answer 1 · answered Feb 27 '23 at 14:26

masked_array is an alias for the MaskedArray class. When you use it, there is no verification of parameters.

masked_where is a function that creates an instance if MaskedArray that checks your parameters.

def masked_where(condition, a, copy=True):
    """[docstring]"""
    # Make sure that condition is a valid standard-type mask.
    cond = make_mask(condition, shrink=False)
    a = np.array(a, copy=copy, subok=True)


    (cshape, ashape) = (cond.shape, a.shape)
    if cshape and cshape != ashape:
        raise IndexError("Inconsistent shape between the condition and the input"
                         " (got %s and %s)" % (cshape, ashape))
    if hasattr(a, '_mask'):
        cond = mask_or(cond, a._mask)
        cls = type(a)
    else:
        cls = MaskedArray
    result = a.view(cls)
    # Assign to *.mask so that structured masks are handled correctly.
    result.mask = _shrink_mask(cond)
    # There is no view of a boolean so when 'a' is a MaskedArray with nomask
    # the update to the result's mask has no effect.
    if not copy and hasattr(a, '_mask') and getmask(a) is nomask:
        a._mask = result._mask.view()
    return result

Thanks, that helps, but can you give an example of where the two would give different results because of this? If I call them with inconsistently shaped value and masked arrays, I get the same error message in both cases. — alani, Feb 27 '23 at 14:43
Thanks very much. I've accepted hpaulj's answer as it gives an example of when the results differ, but your answer is also helpful for understanding what is going on. — alani, Feb 28 '23 at 18:21
It doesn't matter, I also upvoted for @hpaulj's answer and for your interesting question. — Corralien, Feb 28 '23 at 21:32

score 2 · Accepted Answer · answered Feb 27 '23 at 16:42

You comment:

If I call them with inconsistently shaped value and masked arrays, I get the same error message in both cases.

I don't think we can help you without more details on what's different.

For example if I try the obvious inconsistency, that of length, I get different error messages:

In [121]: np.ma.masked_array(vals, cond[:-1])
MaskError: Mask and data not compatible: data size is 5, mask size is 4.
In [122]: np.ma.masked_where(cond[:-1], vals)
IndexError: Inconsistent shape between the condition and the input (got (4,) and (5,))

The test for the where message is obvious from the code that Corralien shows.

The Masked_Array class definition has this test:

        # Make sure the mask and the data have the same shape
        if mask.shape != _data.shape:
            (nd, nm) = (_data.size, mask.size)
            if nm == 1:
                mask = np.resize(mask, _data.shape)
            elif nm == nd:
                mask = np.reshape(mask, _data.shape)
            else:
                msg = "Mask and data not compatible: data size is %i, " + \
                      "mask size is %i."
                raise MaskError(msg % (nd, nm))

I'd expect the same message only if the shapes made it past the where test, but were caught by the Class's test. If so that should be obvious in the full error traceback.

Here's an example that fails on the where, but passes the base.

In [138]: np.ma.masked_where(cond[:,None],vals)
IndexError: Inconsistent shape between the condition and the input (got (5, 1) and (5,))
In [139]: np.ma.masked_array(vals, cond[:,None])
Out[139]: 
masked_array(data=[--, 1, --, 3, --],
             mask=[ True, False,  True, False,  True],
       fill_value=999999)

The base class can handle cases where the cond differs in shape, but matches in size (total number of elements). It tries to reshape it. A scalar cond passes both though the exact test differs.

Based on my reading of the code, I can't conceive of a difference that passes the where, but not the base.

All the Masked Array code is python readable (see the link the other answer). While there is one base class definition, there are a number of constructor or helper functions, as the where docs makes clear. I won't worry too much about which function(s) to use, especially if you aren't trying to push the boundaries of what's logical.

Masked arrays, while a part of numpy for a long time, does not get a whole lot of use, at least judging by relative lack of SO questions. I suspect pandas has largely replaced it when dealing with data that can have missing values (e.g. time series).

Thank you. I will mark this as accepted because it contains a specific example of where the output may differ depending which is used. By the way, I now can't reproduce my observation that the two gave the same error (my comment on the other answer, which you quoted), but maybe somehow it was an environment with a really old numpy, as I distinctly remember the misspelling "Inconsistant" in the error message, which I see used to exist in the code base. — alani, Feb 28 '23 at 18:18

in numpy, what is the difference between calling MA.masked_where and MA.masked_array?

2 Answers2