1

I have a vector of data consisting of about 2 million samples that I suspect is a mixture of two gaussian's. I try to fit the data, Data, to a mixture using matlab's fitgmdist.

From histogram:

% histogram counts of X with 1000 bins.
[Yhist, x] = histcounts(Data, 1000, 'normalization', 'pdf');
x = (x(1:end-1) + x(2:end))/2;

Using fitgmdist:

% Increase no. of iterations. default is 100.
opts.MaxIter = 300;

% Ensure that it does all iterations.
opts.TolFun = 0;

GMModel = fitgmdist(Data, 2, 'Options', opts, 'Start', 'plus');
wts = GMModel.ComponentProportion;
mu = GMModel.mu;
sig = sqrt(squeeze(GMModel.Sigma));
Ygmfit = wts(1)*normpdf(x(:), mu(1), sig(1)) + wts(2)*normpdf(x(:), mu(2), sig(2));

Mixture results with fitgmdist: wts = [0.6780, 0.322], mu = [-7.6444, -9.7831], sig = [0.8243, 0.5947]

Next I try using nlinfit:

% Define the callback function for nlinfit.
function y = nmmix(a, x)
   a(1:2) = a(1:2)/sum(a);
   y = a(1)*normpdf(x(:), a(3), a(5)) + a(2)*normpdf(x(:), a(4), a(6));
end

init_wts = [0.66, 1-0.66];
init_mu = [-7.7, -9.75];
init_sig = [0.5, 0.5];
a = nlinfit(x(:), Yhist(:), @nmmix, [init_wts, init_mu, init_sig]);
wts = a(1:2)/sum(a(1:2));
mu = a(3:4);
sig = a(5:6);
Ynlinfit = wts(1)*normpdf(x(:), mu(1), sig(1)) + wts(2)*normpdf(x(:), mu(2), sig(2));

Mixture results with nlinfit: wts = [0.6349, 0.3651], mu = [-7.6305, -9.6991], sig = [0.6773, 0.6031]

% Plot to compare the results
figure;
hold on
plot(x(:), Yhist, 'b');
plot(x(:), Ygmfit, 'k');
plot(x(:), Ynlinfit, 'r');

blue:Histogram of data, red:Non-linear fit, black:Matlab fitgmdist

It seems to be that the non-linear fit (red curve) is intuitively a better approximation to the histogram (blue curve) than "fitgmdist" (black curve). The results are similar even if I use a finer histogram, say with 100,000 bins.

What can be the source of this discrepancy?

Added later: Of course one would not expect the results to be the same, but I would expect that the visual quality of the two fits would be comparable.

skm
  • 329
  • 2
  • 9
  • 1
    You use two different algorithms, so I'm not surprised at all to see that they yield different results. Check the mathematical working of both to see where the difference stems from. – Adriaan Sep 01 '17 at 12:53
  • @adriaan, I'm not concerned about the difference, which is to be expected, but I'm surprised that nlinfit appears to give a better fit. – skm Sep 01 '17 at 13:24

0 Answers0