What about something like:
import numpy as np
from numpy.ma import masked_array
data = masked_array(data = [7, 0, 7, 1, 8, 0, 1, 1, 0, 0, 3, 0, 0, 3, 0],
mask = [False, True, False, False, False, True, False, False, True, True, False, True, True, False, True])
flag = masked_array(data = [True, False, False, True, 0, 0, 0, False, 0, True, 0, 0, 0, 0, True],
mask = [False, False, False, False, True, True, True, False, True, False, True, True, True, True, False])
print(repr(data))
print(repr(flag))
indices = np.where(flag & ~flag.mask)
print(data[indices])
Note, you may get into trouble if the masked values in flag
can't be compared with &
, but it doesn't look like that's the case for you.
Output:
masked_array(data = [7 -- 7 1 8 -- 1 1 -- -- 3 -- -- 3 --],
mask = [False True False False False True False False True True False True True False True],
fill_value = 999999)
masked_array(data = [1 0 0 1 -- -- -- 0 -- 1 -- -- -- -- 1],
mask = [False False False False True True True False True False True True True True False],
fill_value = 999999)
[7 1 -- --]
Edit:
An alternative way of getting the indices might also be:
indices = np.where(flag.filled(False))
Update (Edit 2):
Beware of the subtleties of indexing arrays using arrays.
Consider the following code:
import numpy as np
data = np.array([1,2,3,4,5])
mask = np.array([True, False, True, False, True])
res = data[mask]
print(res)
As you might (or might not) expect, here, the mask serves as a "filter", filtering out the elements of data where the corresponding location in the mask is False. Because of the values I choose for the data
and mask
, the effect is that the indexing serves to filter out the even data
values leaving only the odd ones.
The output here is: [1 3 5]
.
Now, consider the very similar code:
import numpy as np
data = np.array([1,2,3,4,5])
mask = np.array([1, 0, 1, 0, 1])
res = data[mask]
print(res)
Here, the only thing changed is datatype of the mask elements, their boolean value is the same. Let's call the first mask (comprised of True
/False
values) mask1
and the second mask (comprised of 1
/0
values) mask2
.
You can inspect the datatype of arrays through the dtype
attribute (e.g. print(mask.dtype)
). mask1
has a dtype of bool
, while mask2
has a dtype of int32
.
Here, however, the output is different: [2 1 2 1 2]
.
What's going on here?
In fact, indexing behaves differently depending on the datatype of the array used to index. As mentioned, when the datatype of the "mask" is boolean, it serves a filtering function. But when the datatype of the "mask" is integral, it serves a "selection" function, using the elements of the index as indices of the original array.
So, in the second example, since data[1] = 2
and data[0] = 1
, the result of data[mask2]
is an array of length 5, not 3 (in the boolean case).
Put another way, given the following code:
res = data[mask]
If mask.dtype == int
, the length of res will be equal to the length of mask.
If mask.dtype == bool
, the length of res will be equal to the number of True
values in mask.
Quite a difference.
Lastly, you can coerce an array of one datatype to another using the astype
method.
Demonstration snippet:
import numpy as np
data = np.array([1,2,3,4,5])
# Create a boolean mask
mask1 = np.array([True, False, True, False, True])
# Create an integer "mask", using the same logical values
mask2 = np.array([1,0,1,0,1])
# Coerce mask2 into a boolean mask
mask3 = mask2.astype(bool)
print(data) # [1 2 3 4 5]
print("-" * 80)
print(mask1) # [True False True False True]
print(mask1.dtype) # bool
print(data[mask1]) # [1 3 5]
print("-" * 80)
print(mask2) # [1 0 1 0 1]
print(mask2.dtype) # int32
print(data[mask2]) # [2 1 2 1 2]
print("-" * 80)
print(mask3) # [True False True False True]
print(mask3.dtype) # bool
print(data[mask3]) # [1 3 5]