0

My data is several arrays of data taken of the same length. I am masking one array (y) then using that masked array to mask a 2nd array (x). I mask x to get rid of values indicating equipment error (-9999). I then use np.where() to find out where y is low (1 standard dev below the mean) to mask x in order to see the values of x when y is low.

I have tried changing my mask several times but none of the other numpy masked array operations gave me a different result. I tried to write a logical statement to give me the values when the mask = FALSE but I cannot do that within the np.where() statement.

x = np.array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] )
y = np.array( [ 0, 1, -9999, 3, 4, 5, 6, 7, 8, -9999, 10 ] )

x = np.ma.masked_values( x, -9999 )
y = np.ma.masked_values( y, -9999 )

low_y = ( y.mean() - np.std( y ) )

x_masked = x[ np.where( y < low_y ) ]

When we call x_masked, it returns:

>>>x_masked
masked_array(data=[0, 1, 2, 9],
         mask=False,
   fill_value=-9999)

We expect the mean of x_masked to be 0.5 ( (0 + 1)/2 ) but instead the mean is 3 because of the masked -9999 values ( 2 & 9) that were included in x_masked.

Is there a way to exclude the masked values in order to only get the unmasked values?

danrod13
  • 91
  • 1
  • 8

2 Answers2

1

I think you'd want to masked x where y != -9999. If you make this change to your code, it works as you expect.

You could also just use np.where to mask.

x = x[np.where(y != -9999)]
y = y[np.where(y != -9999)]

low_y = ( y.mean() - np.std( y ) )

x_masked = x[np.where( y < low_y)]
print (x_masked)
[0 1]
stahamtan
  • 848
  • 6
  • 10
  • When I do that I get the same result as my initial code. This is what I got when I ran your code: masked_array(data=[0, 1, 2, 9] – danrod13 Sep 11 '19 at 18:28
  • I am surprised with your result being `[0, 1, 2, 9]`. Try `x = np.ma.masked_values( y, -9999 )` in your code, you should get `masked_array(data=[0, 1, --, --]...` – stahamtan Sep 11 '19 at 18:46
  • `x = np.ma.masked_values( y, -9999 )` worked! Is there a way to get ma.masked_values to mask out other masked values? Would it work if I wrote: `x = np.ma.masked_values( y, -- )` or something similar? – danrod13 Sep 11 '19 at 19:14
  • If I understand your question correctly, you do not want `--` to show in your array? If so, you just follow my code above using `np.where` – stahamtan Sep 11 '19 at 19:16
1

Since version 1.8 numpy added nanstd and nanmean to handle missing data. In your case since the -9999 is there to indicate error state and by definition I think it is a good use case of numpy.nan

In [76]: y = np.where(y==-9999, np.nan, y)

In [77]: low_y = (np.nanmean(y) - np.nanstd(y))

In [78]: low_y
Out[78]: 1.8177166753143883

In [79]: x_masked = x[ np.where( y < low_y ) ]  # [0, 1]
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • This works. So now if I want to find the mean of any value that has applied to it np.nan I need to use np.nanmean(), right? – danrod13 Sep 16 '19 at 18:29