0

I've created a Gaussian Mixture Model in Python with 11 components based on 8-dimensional data (I picked 11 components because that's what minimized the BIC score). I now have a test sample of data (50 samples of 8-dimensional data), and I want to evaluate the probability that each of these 50 samples can be described by this GMM. I have this so far:

from sklearn.mixture import GaussianMixture    
gm = GaussianMixture(n_components=11, random_state=0).fit(train_data)
loglike_test = gm.score_samples(test_data)
probs = np.exp(loglike_test)

But these probabilities do not all fall between 0 and 1. I also computed the probabilities for train_data, and many of the values are much greater than 1 (on the order of 10^3). How can I transform these probabilities to percentages that make sense?

curious_cosmo
  • 1,184
  • 1
  • 18
  • 36
  • what do you mean by the probability that each sample can be described by the GMM? Do you mean the probability that the point was drawn from each cluster? i.e. returning a list of 11 probabilities (one for each component) for each sample? – Galletti_Lance Nov 15 '22 at 15:54
  • @Galletti_Lance I'd like the probability that the point is drawn from any of the clusters. But if I knew how to return the probabilities for each cluster like you describe, that you also be helpful. – curious_cosmo Nov 16 '22 at 16:05
  • `gm.predict_proba(test_data)` might be what you're looking for. Then you can get the `max` probability and see if it's close to 1. If for most data points it's not close to 1 then maybe a GMM does not describe the data well. – Galletti_Lance Nov 16 '22 at 16:36

0 Answers0