1

I am trying to visualize the fitted gaussian distribution from a Gaussian Mixture Model and can't seem to figure it out. Here and here I have seen examples for visualizing the fitted distributions of a one-dimensional model and I don't figure out how to apply it to a model with 3 features. Is it possible to visualize the fitted distributions for each training feature?

I have named my model estimator and trained it with X_train:

estimator = GaussianMixture(covariance_type='full', init_params='kmeans', max_iter=100,
        means_init=array([[ 0.41297,  3.39635,  2.68793],
       [ 0.33418,  3.82157,  4.47384],
       [ 0.29792,  3.98821,  5.78627]]),
        n_components=3, n_init=1, precisions_init=None, random_state=0,
        reg_covar=1e-06, tol=0.001, verbose=0, verbose_interval=10,
        warm_start=False, weights_init=None)

The first 5 samples of X_train looks like:

X_train[:6,:] = array([[  0.29818663,   3.72573161,   4.19829702],
       [  0.24693619,   4.33026266,  10.74416161],
       [  0.21932575,   3.98019433,   8.02464581],
       [  0.24426255,   4.41868353,  10.52576923],
       [  0.16577695,   4.35316706,  12.63638592],
       [  0.28952628,   4.03706551,   8.03804016]])

The shape of X_train is (3753L, 3L). My plotting routine to flot the first feature's fitted gaussian distributions is as follows:

fig, (ax1,ax2,a3) = plt.subplots(nrows=3)
#Domain for pdf
x = np.linspace(0,0.8,3753)
logprob = estimator.score_samples(X_train)
resp = estimator.predict_proba(X_train)
pdf = np.exp(logprob)
pdf_individual = resp * pdf[:, np.newaxis]
ax1.hist(X_train[:,0],30, normed=True, histtype='stepfilled', alpha=0.4)    
ax1.plot(x, pdf, '-k')
ax1.plot(x, pdf_individual, '--k')
ax1.text(0.04, 0.96, "Best-fit Mixture",
        ha='left', va='top', transform=ax.transAxes)
ax1.set_xlabel('$x$')
ax1.set_ylabel('$p(x)$')  
plt.show()    

But that does not seem to work. Any ideas on how to make this work?

Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
dubbbdan
  • 2,650
  • 1
  • 25
  • 43

1 Answers1

0

If I load your sample data and fit the estimator:

X_train = np.array([[  0.29818663,   3.72573161,   4.19829702],
   [  0.24693619,   4.33026266,  10.74416161],
   [  0.21932575,   3.98019433,   8.02464581],
   [  0.24426255,   4.41868353,  10.52576923],
   [  0.16577695,   4.35316706,  12.63638592],
   [  0.28952628,   4.03706551,   8.03804016]])
estimator.fit(X_train)

Couple of issues: the linspace length isn't right, and you're calling ax.transAxes but you haven't defined any ax. Here's a version that works:

fig, (ax1,ax2,a3) = plt.subplots(nrows=3)

logprob = estimator.score_samples(X_train)
resp = estimator.predict_proba(X_train)

Here the length should match the logprob/pdf one

#Domain for pdf
x = np.linspace(0,0.8,len(logprob))

pdf = np.exp(logprob)
pdf_individual = resp * pdf[:, np.newaxis]
ax1.hist(X_train[:,0],30, normed=True, histtype='stepfilled', alpha=0.4)    
ax1.plot(x, pdf, '-k')
ax1.plot(x, pdf_individual, '--k')

Here, ax1.transAxes is expected:

ax1.text(0.04, 0.96, "Best-fit Mixture",
        ha='left', va='top', transform=ax1.transAxes)
ax1.set_xlabel('$x$')
ax1.set_ylabel('$p(x)$')  
plt.show()

Result plot

Julien Marrec
  • 11,605
  • 4
  • 46
  • 63