3

I'm fitting a mixture of two gaussians to 1D data (over 1000 points).

It seems that the peaks of the sum of two gaussians are shifted to the left relative to the peaks of the histogram. I assume this is due to my data having a cut-off at around 0.5.

Green and red lines are two best fitting gaussians, black is the sum of two. Here's the plot: Plot

Is there any way I can ensure that the peaks match, even though there is a lack of data points on the right?

I'm using :

    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn import mixture
    import scipy.stats as stats

    g = mixture.GaussianMixture(n_components=2,covariance_type='full')
    g.fit(data)
    weights = g.weights_
    means = g.means_
    covars = g.covariances_

    num_bins = 50
    n, bins, patches = plt.hist(data, num_bins, normed=True, facecolor='blue', alpha=0.2)
    plt.plot(x,weights[0]*stats.norm.pdf(x,means[0],np.sqrt(covars[0])), c='red')
    plt.plot(x,weights[1]*stats.norm.pdf(x,means[1],np.sqrt(covars[1])), c='green')
    plt.plot(x, weights[0]*stats.norm.pdf(x,means[0],np.sqrt(covars[0])) + weights[1]*stats.norm.pdf(x,means[1],np.sqrt(covars[1])), c = 'black')
Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Anna
  • 85
  • 8
  • Maybe the *histogram* is shifted to the right? This is a fairly common error with histograms. – Has QUIT--Anony-Mousse Jan 06 '17 at 11:03
  • @Anony-Mousse I suppose I can align my bins to the left, which will move the peak, but I think the bigger problem is that GMM is not making a good fit for the sake of making sure that both fitted Gaussians start and end with the data. For example the green Gaussian on my plot could have been taller and wider, shifted more to the right, which would represent the bump at -0.5 better – Anna Jan 06 '17 at 12:42
  • Well, your data may just be less "normal" than you think; GMM probably does a decent job at fitting. But yes, it does not use an estimator for right-censored distributions. You'd need your own code to do that. – Has QUIT--Anony-Mousse Jan 06 '17 at 16:06

1 Answers1

1

You are simply adding the green gaussian to the total of the red one. Since there is a lot of overlap of the two gaussians, if you want the peaks to match, you'd have to not add the the green guassian to the red guassian as the red guassian is approaching its peak.

lito
  • 11
  • 1