numpy: why don't my arrays have the same size after applying a mask intersection?

Question

When I run .compressed() to turn a MaskedArray into a normal ndarray there are two fewer items. Any ideas why this could be?

mask_intersection = np.ma.mask_or(p_data.astype("float64"), cr_data.astype("float64"))

ipdb> mask_intersection.shape
(178, 163)
ipdb> p_data.shape
(178, 163)
ipdb> cr_data.shape
(178, 163)

ipdb> p_data[mask_intersection].flatten().size
16579
ipdb> cr_data[mask_intersection].compressed().size
16579
ipdb> mask_intersection.sum()
16579
ipdb> p_data[mask_intersection].compressed().size
16577 <-- wtf??

no nans

ipdb> np.argwhere(np.isnan(cr_data[mask_intersection]))
array([], shape=(0, 1), dtype=int64)
ipdb> np.argwhere(np.isnan(p_data[mask_intersection]))
array([], shape=(0, 1), dtype=int64)

Here is a copy of p_data https://filebin.net/ua0rn59wl1c2txac

import pickle
p_data = pickle.load(open("./p_data.obj", 'rb'))

hmm this is also strange

ipdb> p_data[mask_intersection].shape
(16579,)

am I applying the mask intersection correctly? I don't mind ending up with a 1d ndarray but the shape-transformation is a little unexpected

ooo interesting... not sure what this means

ipdb> cr_data[mask_intersection].mask.sum()
0
ipdb> p_data[mask_intersection].mask.sum()
2

I wonder if what is happening is that p_data.mask is old so I need to do

p_data_smaller = p_data[mask_intersection]
p_data_smaller.mask = False

or maybe I need to do

p_data.mask = mask_intersection

instead

I know `R` has `dput()` is there something similar in python to copy variables into pastebin? maybe I'll try `pickle.dumps()` — jaksco, Feb 05 '22 at 23:20
It will be easier for someone to help you if you provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). — Warren Weckesser, Feb 05 '22 at 23:26
Hmmm, @fakso I'm not able to visit the URL linked at the end of your post. It says `AccessDenied`. — , Feb 05 '22 at 23:38
I also added the mask_intersection to the same filebin thing as `mask.obj` — jaksco, Feb 05 '22 at 23:44

score 0 · Answer 1 · answered Feb 06 '22 at 00:27

The solution seems to be to overwrite the ma.mask instead of subsecting the ndarray.

# copy nodata from both rasters and intersect
mask_intersection = np.ma.mask_or(p_data.mask, cr_data.mask)
# flatten into 1d ndarray (not MaskedArray)
p_data.mask = mask_intersection
cr_data.mask = mask_intersection
p_data_c = p_data.compressed()
cr_data_c = cr_data.compressed()

numpy: why don't my arrays have the same size after applying a mask intersection?

1 Answers1