Covariance Matrix with Mask - Python

Question

I'm using the np.ma module to calculate Covariance of two overlapping image arrays that have noData value present in them. The code goes as follows:

import numpy as np  
arr1 = np.array([1638,1753,1601,1819,-9999,1627,1400,1379,1055])
arr2 = np.array([-9999,1455,1973,1330,1915,1842,1816,1218,-9999])
images = np.vstack((arr1.ravel(),arr2.ravel()))
images =  np.ma.array(images , mask=images ==-9999)
cov_mat = np.ma.cov(images ,bias=True)

The output of the covariance matrix is :

[[53070.25 -8273.07142857143]
[-8273.07142857143 80860.40816326531]]

I have a issue with the result, the first value is calculated using arr1 and the number of observations equal to 8 (since there is only a single noData value). The last value is calculated using arr2 but the number of observations taken is 7 (Two noData values).

Is there a way to specify to the cov function to take only the maximum number of observations for division while calculating covariance or will I have to perform the operation manually?

What do you mean by the maximum number of observations? If you mean 8 observations for the element at [1,1], then it will be wrong, because the masked value will have to be taken into account. — Oliver W., Apr 06 '15 at 11:35
During calculation of the covariance matrix the formula states to divide by number of observations. If we calculate covariance of arr1 wrt itself, the total number of observations taken are 8 (since bias = True). If we calculate covariance of arr2 wrt itself, the total number of observations taken are 7. I want the total number of observations for both the cases to remain same. Is that possible while using np.ma.cov? — rsumbaly, Apr 06 '15 at 11:39
Yes, like I've shown you in [the answer to your previous question](http://stackoverflow.com/a/29457110/2476444). If you use the 2nd way of calling, the mask will be extended such that only elements where there is no mask present for any of the variables (images) will be taken into consideration for the covariance calculation. — Oliver W., Apr 06 '15 at 11:42
Is that the answer you were looking for? Right now it's unclear whether this problem is resolved. Others might want to have a stab at this, if my answer wasn't what you were looking for? — Oliver W., Apr 06 '15 at 13:49
Thanks for the answer. But what I actually want is that even with the masking of the elements in both the methods the total number of observations taken during the calculation of covariance be equal to the original number which in my case in 9 and not the number of valid finite elements. — rsumbaly, Apr 07 '15 at 06:51
In that case, why not simply rescale the resultant covariance matrix with `(nbr_valid_finite - 1)/ (nbr_total_observations - 1)`? If that is not what you're after, then I'm getting the feeling you want a sample that *does* take into account the invalid elements, which will result in an incorrect (=unwanted) cov. matrix. That's the same as simply taking `np.cov` then, no masking involved. — Oliver W., Apr 07 '15 at 07:28

Covariance Matrix with Mask - Python

0 Answers0