Here you can see the plot of the newly fit model:
the bins show all the now available data, so the initial data used to fit the model and the new data. The new data does not include the higher values. These are the model parameters:
GaussianMixture(max_iter=10000, n_components=2, tol=0.0001, warm_start=True)
so warm_start certainly is set to true. When sampling from the model i also do not receive the high values. So it does not seem to be an error in the plot either.
When fitting the model, which is called gmm
, with new data i simply do
gmm_new = gmm.fit(new_data)
The new data is already expanded in dimensions so that this works. When fitting the model again with new AND old data, so the whole dataset, the results look fine though. But wouldn't that mean that I fitted the old data twice? Am I using the warm-start wrong?