7

I have a list of boolean masks obtained by applying different search criteria to a dataframe. Here is an example list containing 4 masks:

mask_list = [mask1, mask2, mask3, mask4]

I would like to find the logical or of all of the masks in the list. In other words,

or_mask = mask_list[0] | mask_list[1] | mask_list[2] | mask_list[3]

Is there a compact way to accomplish this for a list containing an arbitrary number of masks? I understand that I can write a for loop as below, but is there a shorter, more pythonic way to do this?

for i in range(len(mask_list)):
    if i == 0:
        temp_mask_or = mask_list[i]
    else:
        temp_mask_or = temp_mask_or | mask_list[i]
alwaysCurious
  • 523
  • 5
  • 14

2 Answers2

8

You can use reduce:

or_(x,y) means x|y so this will work:

from operator import or_
or_mask = reduce(or_,mask_list)

Edit: As suggested by JoeCondron, instead of operator.or_ you could use numpy.logical_or which gives the same result but is faster.

Pekka
  • 2,348
  • 2
  • 21
  • 33
  • I wasn't aware of these commands--thanks for the enlightenment! – alwaysCurious Aug 21 '15 at 07:16
  • 6
    you can also use `numpy.logical_or` as opposed to `or_` which looks to be around 4 - 6 times faster. – JoeCondron Aug 21 '15 at 08:00
  • I have built two masks, and one happens to contain a NaN, and is therefore considered of dtype object. Reducing the masks using `operator.or_` does as if the NaN was a False. Using `numpy.logical_or` makes the resulting mask being of dtype object and masking then results in a `ValueError: Cannot mask with non-boolean array containing NA / NaN values`. (I had initially tried `np.any(masks, axis=0)`, which results in the same error.) – bli Oct 01 '22 at 06:46
2

I normally use a similar loop to yours when combining masks, perhaps slightly differently:

combmask = mask_list[0]
for mask in mask_list[1:]:
    combmask |= mask

If that's not short enough, you can use the fact that you're only or-ing the masks, to your advantage, and treat them as ints (which they are under the hood):

combmask = np.array(sum(mask_list), dtype=np.bool)

If you look at sum(mask_list), you'll find it's just a list of integers.

One possible caveat, what I don't know about the latter method, is whether it runs into problems when you try to sum more than 255 masks, where at least one (the same) element is always True (i.e., 1). The underlying integer type to store the np.bool is only 8 bits afaik, and you would reach the integer limit that way. Perhaps numpy/Python automatically casts everything to a 16 bit integer type before proceeding, but I don't know that.


Edit: I'll leave the latter mention in, but it was easy to check:

In [51]: len(mask_list)
Out[51]: 4

In [52]: sum(mask_list).dtype
Out[52]: dtype('int64')

So even the sum of a short list of mask is converted to a 64 bit integer (which can then easily be converted to a boolean mask array), and you won't run easily into the integer limit.

  • Yes, I like your for loop much better. And very interesting about the mask just being a list of integers--I didn't know that. – alwaysCurious Aug 21 '15 at 07:17