Mean and covariance of conditional distribution

Question

I have a 10000 X 22 dimensional array (observations x features) and I fit a gaussian mixture with one component as following:

mixture = sklearn.mixture.GaussianMixture(n_components=1, covariance_type='full').fit(my_array)

Then, I want to calculate the mean and the covariance of the conditional distribution of the first two features over the rest as per Bishop's Pattern Recognition and Machine learning equations 2.81 and 2.82 in p.87. What I do is the following:

covariances = mixture.covariances_ # shape = (1, 22, 22) where 1 is the 1 component I fit and 22x22 is the covariance matrix
means = mixture_component.means_ # shape = (1, 22), 22 means; one for each feautre
dependent_data = features[:, 0:2] #shape = (10000, 2)
conditional_data = features[:, 2:] #shape = (10000, 20)
mu_a = means[:, 0:2]  # Mu of the dependent variables
mu_b = means[:, 2:]  # Mu of the independent variables
cov_aa = covariances[0, 0:2, 0:2] # Cov of the dependent vars       
cov_bb = covariances[0, 2:, 2:]  # Cov of independent vars         
cov_ab = covariances[0, 0:2, 2:]                                  
cov_ba = covariances[0, 2:, 0:2]
A = (conditional_data.transpose() - mu_b.transpose())
B = cov_ab.dot(np.linalg.inv(cov_bb))
conditional_mu = mu_a + B.dot(A).transpose()
conditional_cov = cov_aa - cov_ab.dot(np.linalg.inv(cov_bb)).dot(cov_ba)

My problem is that on calculating the conditional_mu and the conditional_cov, I'm getting the following shapes:

conditional_mu.shape
(10000, 2)
conditional_cov.shape
(2,2)

I was expecting that the shape of the conditional_mu should be (1,2) because I only want to find the means of the first two features over the rest. Why am I getting a mean for each observation instead?

Your `A` has shape `(20, 10000)` therefore `(BA)^T` has shape `(10000, 2)` which is where your observed shape comes from. — Paul Panzer, Jan 20 '18 at 18:32
@PaulPanzer yeah I know why the result is of specific shape. I'm asking if it's supposed to be like that or not because it doesn't seem right — azal, Jan 20 '18 at 18:40
In that case it's not really a programming question, is it? Maybe you'd like to move it to a more appropriate place like one of the math sites? — Paul Panzer, Jan 20 '18 at 18:58
@PaulPanzer from my experience there are many computer scientists who are familiar with this stuff, so I'll leave it here and also post it elsewhere to increase the chances of an answer. — azal, Jan 20 '18 at 19:10

score 0 · Accepted Answer · answered Jan 20 '18 at 20:32

0

Yes, that is the expected dimension.

For each data point, the independent feature is fixed, and the dependent features follow a normal distribution. Each data point would give a different mean for the dependent feature depending on the independent feature.

Since you have 10000 data points, you should have 10000 means for the dependent feature, each for one data point.

answered Jan 20 '18 at 20:32

Siong Thye Goh

3,518
10
23
31

Thanks @Siong Thye Goh. What about the covariance though? – azal Jan 20 '18 at 23:15
The dimension is also expected as well for covariance matrix. Notice that in equation 2.82, it is independent of x_b. That is the data share the same covariance matrix, just that the mean differs and depends on x_b. – Siong Thye Goh Jan 21 '18 at 03:47
So, the covariance matrix should be 1000 x 1000? Why when I fit the Gaussian Mixture with one component is 20x20? And then how can I get from this results the mean and the covariance in a such a way that I can plot it like an ellipsoid? – azal Jan 21 '18 at 14:03
The covariance matrix should be 2x2. The dependent features only have 2 features. Fixing the independent feature, you get the Gaussian distibution of the dependent feature which follows the mean (i.e. one of the column of your conditional mean) and the 2 x 2 covariance matrix. – Siong Thye Goh Jan 21 '18 at 14:07
Ok. So, based on what you say then, my calculations up to this point are correct? What am I missing then? What should I do to get the 2 conditional means for my 2 dependent feautures? – azal Jan 21 '18 at 15:03
Each column of conditional_mu is a conditional mean. smaller example, suppose height and weight are jointly nomally distributed and you observed the weight but not the height. Given a particular weight, you would estimate a different distribution for the height. Suppose (h1, w1=45kg) and (h2, w2=50kg) is another data, then one of the column give you the mean of distribution of (h1 | w1 = 45kg) and another column would give you mean of (h2 | w2 = 50kg), the mean are different though the variance is the same. – Siong Thye Goh Jan 21 '18 at 15:38
If I take the column of the column of the 10000 conditional_means, I get the same result as without conditioning. That's why I am confused – azal Jan 21 '18 at 15:55
you might want to inspect matrix B and see what do you get. the phenomenon that you describe is possible if the first two features and the the remaining features are independent. – Siong Thye Goh Jan 21 '18 at 16:49
Also, you want to track the first moment every column become equal, also inspect A, if the conditional_data is not constant, A should not be constant. is B.dot(A).tranpose() = 0? print some component out or work with smaller examples to find bug if any or improve understanding if there is no bug. – Siong Thye Goh Jan 21 '18 at 17:18
I understand that my results are correct now. I'm having two dependent features so I have their two conditioned covariances in shape (2,2) and each of the observation has a different conditioned mean in shape (num_of_observations=1000, num_of_features). The thing is that I was expecting that the "weighted" means of the dependent features would be different than the original "unweighted" means and that's why I got confused. So, based on what you're saying, this means that the features are independent? – azal Jan 21 '18 at 22:14
look at cov_ab, do you get a zero matrix? if it is zero, then B is zero, and conditonal_mu = mu_a. – Siong Thye Goh Jan 22 '18 at 03:42
Nope, it's not. It's a (2,20) array with no zero elements. – azal Jan 22 '18 at 11:16
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/163635/discussion-between-siong-thye-goh-and-gelazari). – Siong Thye Goh Jan 22 '18 at 11:22

Mean and covariance of conditional distribution

1 Answers1