You comment:
If I call them with inconsistently shaped value and masked arrays, I get the same error message in both cases.
I don't think we can help you without more details on what's different.
For example if I try the obvious inconsistency, that of length, I get different error messages:
In [121]: np.ma.masked_array(vals, cond[:-1])
MaskError: Mask and data not compatible: data size is 5, mask size is 4.
In [122]: np.ma.masked_where(cond[:-1], vals)
IndexError: Inconsistent shape between the condition and the input (got (4,) and (5,))
The test for the where
message is obvious from the code that Corralien shows.
The Masked_Array
class definition has this test:
# Make sure the mask and the data have the same shape
if mask.shape != _data.shape:
(nd, nm) = (_data.size, mask.size)
if nm == 1:
mask = np.resize(mask, _data.shape)
elif nm == nd:
mask = np.reshape(mask, _data.shape)
else:
msg = "Mask and data not compatible: data size is %i, " + \
"mask size is %i."
raise MaskError(msg % (nd, nm))
I'd expect the same message only if the shapes made it past the where
test, but were caught by the Class's test. If so that should be obvious in the full error traceback.
Here's an example that fails on the where
, but passes the base.
In [138]: np.ma.masked_where(cond[:,None],vals)
IndexError: Inconsistent shape between the condition and the input (got (5, 1) and (5,))
In [139]: np.ma.masked_array(vals, cond[:,None])
Out[139]:
masked_array(data=[--, 1, --, 3, --],
mask=[ True, False, True, False, True],
fill_value=999999)
The base class can handle cases where the cond
differs in shape
, but matches in size
(total number of elements). It tries to reshape it. A scalar cond
passes both though the exact test differs.
Based on my reading of the code, I can't conceive of a difference that passes the where
, but not the base.
All the Masked Array code is python readable (see the link the other answer). While there is one base class definition, there are a number of constructor or helper functions, as the where
docs makes clear. I won't worry too much about which function(s) to use, especially if you aren't trying to push the boundaries of what's logical.
Masked arrays, while a part of numpy
for a long time, does not get a whole lot of use, at least judging by relative lack of SO questions. I suspect pandas
has largely replaced it when dealing with data that can have missing values (e.g. time series).