2

I'm trying to fit some gaussians, of which I already have a pretty good idea about the initial parameters (in this case, I'm generating the distributions, so I should always be able to fit these). However, I can't seem to figure out how to force the mean to be e.g. 0 for both gaussians. Is it possible? m.means_ = ... doesn't work.

from sklearn import mixture
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy import stats

a = np.random.normal(0, 0.2, 500)
b = np.random.normal(0, 2, 800)

obs = np.concatenate([a,b]).reshape(-1,1)
plt.hist(obs, bins = 100, normed = True, color = "lightgrey")

min_range = -8
max_range = 8

n_gaussians = 2

m = mixture.GaussianMixture(n_components = n_gaussians)
m.fit(obs)

# # Get the gaussian parameters
weights = m.weights_
means = m.means_
covars = m.covariances_

# Plot all gaussians

n_gaussians = 2

gaussian_sum = []
for i in range(n_gaussians):
    mean = means[i]
    sigma = math.sqrt(covars[i])

    plotpoints = np.linspace(min_range,max_range, 1000)

    gaussian_points = weights[i] * stats.norm.pdf(plotpoints, mean, sigma)
    gaussian_points = np.array(gaussian_points)

    gaussian_sum.append(gaussian_points)

    plt.plot(plotpoints,
             weights[i] * stats.norm.pdf(plotpoints, mean, sigma))

sum_gaussian = np.sum(gaussian_sum, axis=0)
plt.plot(plotpoints, sum_gaussian, color = "black", linestyle = "--")
plt.xlim(min_range, max_range)

plt.show()
komodovaran_
  • 1,940
  • 16
  • 44

2 Answers2

1

(Assuming you don't want to force, but give an initial-guess. The fixed-case probably needs to touch the whole code and it's highly questionable if the whole EM-approach is of use then. It probably collapes into some optimization problem approachable by scipy's optimize module.)

Just follow the docs. It's supported at time of GaussianMixture-creation.

weights_init : array-like, shape (n_components, ), optional

The user-provided initial weights, defaults to None. If it None, weights are initialized using the init_params method.

means_init : array-like, shape (n_components, n_features), optional

The user-provided initial means, defaults to None, If it None, means are initialized using the init_params method.

Community
  • 1
  • 1
sascha
  • 32,238
  • 6
  • 68
  • 110
  • So it's not possible to force, or restrict the parameters in certain ranges, instead of just initial guesses? – komodovaran_ Dec 20 '17 at 16:49
  • 1
    Not without changing the code. As this defeats the whole EM-approach in my opinion! (but i would not consider myself an expert there). – sascha Dec 20 '17 at 16:49
1

So what I was actually after was known priors, which means that it should actually be fitted with BayesianGaussianMixture, which allows one to set a mean_prior and a mean_prior_precision

Fitting with

m = mixture.BayesianGaussianMixture(n_components = n_gaussians, mean_prior = np.array([0]), mean_precision_prior = np.array([1]))

One can force it to work out even this: enter image description here

komodovaran_
  • 1,940
  • 16
  • 44
  • A prior does not effect in fixing those values (except for maybe ```mean_precision_prior=0``` which is not allowed). – sascha Dec 20 '17 at 17:04
  • Correct, it doesn't fix the values entirely, but it makes them much more likely to land in the priors (e.g. here for a sigma of 5 for the broad distribution, the mean was guessed to be 0.07) unless it's quite off. At least from my testing. More advanced cases may be harder to steer in the desired direction. – komodovaran_ Dec 20 '17 at 17:22
  • docs say for `mean_precision_prior`: "Larger values concentrate the cluster means around `mean_prior`.", means that if you set the precision prior to a high value (not 0) it enforces more the prior. so just set it high and it should fixate the means. – Viktor Tóth Mar 20 '20 at 03:03