1

I have trained a Gaussian Mixture Model with sklearn and I am trying to obtain the unnormalized responsibilities of a data point given the cluster means and variances.

GMM.predict_proba unfortunately returns the normalized probabilities such that they sum up to one but I need the raw ones.

I have tries the following (GMM is the fitted GM-model):

import numpy as np
from sklearn import mixture
lpr = (mixture.log_multivariate_normal_density(X, GMM.means_, GMM.covars_, GMM.covariance_type) + np.log(GMM.weights_))
probs = np.exp(lpr)

But the probabilities I obtained are bigger than 1.

What am I doing wrong?

Prags
  • 2,457
  • 2
  • 21
  • 38
user3446746
  • 141
  • 1
  • 12

1 Answers1

0

lpr is the log probabilities of the Gaussian components. To convert to the probability of GMM, sum of theses in log space should be performed. The following code will explain this.

from sklearn.utils.extmath import logsumexp

lpr = (mixture.log_multivariate_normal_density(X, GMM.means_, GMM.covars_, GMM.covariance_type) + np.log(GMM.weights_)) # probabilities of components
logprob = logsumexp(lpr, axis=1) # logsum to get probability of GMM
probs = np.exp(logprob) # 0 < probs < 1 
emesday
  • 6,078
  • 3
  • 29
  • 46