Efficient way of merging two numpy masked arrays

Question

I have two numpy masked arrays which I want to merge. I'm using the following code:

import numpy as np

a = np.zeros((10000, 10000), dtype=np.int16)
a[:5000, :5000] = 1
am = np.ma.masked_equal(a, 0)

b = np.zeros((10000, 10000), dtype=np.int16)
b[2500:7500, 2500:7500] = 2
bm = np.ma.masked_equal(b, 0)

arr = np.ma.array(np.dstack((am, bm)), mask=np.dstack((am.mask, bm.mask)))
arr = np.prod(arr, axis=2)
plt.imshow(arr)

The problem is that the np.prod() operation is very slow (4 seconds in my computer). Is there an alternative way of getting a merged array in a more efficient way?

In your real data are the numbers in the arrays really always constants like that? Do you actually need to multiply the numbers, or is it only the masks you actually care about? — John Zwinck, Dec 01 '15 at 07:12
I'm intending to use this with arrays representing images. Ideally I'd like to preserve the values of the original arrays (I know my code doesn't preserve the values either as they get multiplied in the intersection)... — prl900, Dec 01 '15 at 09:23

John Zwinck · Accepted Answer · 2015-12-01T09:57:07.267

3

Instead of your last two lines using dstack() and prod(), try this:

arr = np.ma.array(am.filled(1) * bm.filled(1), mask=(am.mask * bm.mask))

Now you don't need prod() at all, and you avoid allocating the 3D array entirely.

edited Dec 01 '15 at 09:57

answered Dec 01 '15 at 07:23

John Zwinck

239,568
38
324
436

This proposal is one order of magnitude more efficient! The only problem is that array bm doesn't preserve its values. bm gets multiplied by am values inside the intersection zone and gets 0 values outside it. – prl900 Dec 01 '15 at 09:14
@MonkeyButter: Hmm that was strange, wasn't it? I've updated my answer to add `filled(1)` during multiplication to fix that issue. Please try it now. – John Zwinck Dec 01 '15 at 09:57

score 2 · Answer 2 · answered Jul 04 '18 at 06:46

I took another approach that may not be particularly efficient, but is reasonably easy to extend and implement.

(I know I'm answering a question that is over 3 years old with functionality that has been around in numpy a long time, but bear with me)

The np.where function in numpy has two main purposes (it is a bit weird), the first is to give you indices for a boolean array:

>>> import numpy as np

>>> a = np.arange(12).reshape(3, 4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

>>> m = (a % 3 == 0)
>>> m
array([[ True, False, False,  True],
       [False, False,  True, False],
       [False,  True, False, False]], dtype=bool)

>>> row_ind, col_ind = np.where(m)
>>> row_ind
array([0, 0, 1, 2])
>>> col_ind
array([0, 3, 2, 1])

The other purpose of the np.where function is to pick from two arrays based on whether the given boolean array is True/False:

>>> np.where(m, a, np.zeros(a.shape))
array([[ 0.,  0.,  0.,  3.],
       [ 0.,  0.,  6.,  0.],
       [ 0.,  9.,  0.,  0.]])

Turns out, there is also a numpy.ma.where which deals with masked arrays...

Given a list of masked arrays of the same shape, my code then looks like:

merged = masked_arrays[0]
for ma in masked_arrays[1:]:
    merged = np.ma.where(ma.mask, merged, ma)

As I say, not particularly efficient, but certainly easy enough to implement.

HTH

prl900 · Answer 3 · 2015-12-06T22:57:35.530

Inspired by the accepted answer I've found a simple way of merging masked arrays. It works making some logical operations on the masks and simply adding 0 filled arrays.

import numpy as np

a = np.zeros((1000, 1000), dtype=np.int16)
a[:500, :500] = 2
am = np.ma.masked_equal(a, 0)

b = np.zeros((1000, 1000), dtype=np.int16)
b[250:750, 250:750] = 3
bm = np.ma.masked_equal(b, 0)

c = np.zeros((1000, 1000), dtype=np.int16)
c[500:1000, 500:1000] = 5
cm = np.ma.masked_equal(c, 0)

bm.mask = np.logical_or(np.logical_and(am.mask, bm.mask), np.logical_not(am.mask))
am = np.ma.array(am.filled(0) + bm.filled(0), mask=(am.mask * bm.mask))

cm.mask = np.logical_or(np.logical_and(am.mask, cm.mask), np.logical_not(am.mask))
am = np.ma.array(am.filled(0) + cm.filled(0), mask=(am.mask * cm.mask))

plt.imshow(am)

I hope someone find this helpful sometime. Masked arrays doesn't seem to be very efficient though. So, if someone finds an alternative to merge arrays I'd be happy to know.

Update: Based on @morningsun comment this implementation is 30% faster and much simpler:

import numpy as np

a = np.zeros((1000, 1000), dtype=np.int16)
a[:500, :500] = 2
am = np.ma.masked_equal(a, 0)

b = np.zeros((1000, 1000), dtype=np.int16)
b[250:750, 250:750] = 3
bm = np.ma.masked_equal(b, 0)

c = np.zeros((1000, 1000), dtype=np.int16)
c[500:1000, 500:1000] = 5
cm = np.ma.masked_equal(c, 0)

am[am.mask] = bm[am.mask]
am[am.mask] = cm[am.mask]

plt.imshow(am)

I think this is simpler and also a little faster: `am[am.mask] = bm[am.mask]; am[am.mask] = cm[am.mask]` — , Dec 02 '15 at 21:15
@morningsun this is awesome and so simple! The appended arrays don't even need to be masked arrays. This is way faster than any of the other methods. Thank you very much! — prl900, Dec 02 '15 at 23:08

Efficient way of merging two numpy masked arrays

3 Answers3