For me the best option was to discard that column. It was OK in my application, but may not be for you.
Here's the bit of the gmdistribution
class definition that checks for that condition and produces the error:
varX = var(X);
I = find(varX < eps(max(varX))*n);
if ~isempty(I)
error('stats:gmdistribution:ZeroVariance',...
'The following column(s) of data are effectively constant: %s.', num2str(I));
end
where X
is the multivariate data passed to the fit
method. Its test for 'effectively zero variance' is a combination of eps
, which is a measure of the smallest difference representable by the current datatype (such as uint8
or double
) and the number of rows in your data.
So one approach is to reimplement that test and do something about it before gmdistribution.fit
throws the error. If the variance of the data is so low that it's considered zero then there's nothing to be gained from its inclusion and thus there's no harm in discarding that column and carrying on fitting with the ones that are left.
From the looks of your example that would be half your dataset. This may not be ideal, but it's not uncommon in multivariate analysis to find a subset of your variables contain the majority of the variance (cf Pareto). You could do principal component analysis first to discard some of those prior to the gmm fit, though the above test is effectively doing that already.
If you absolutely have to include those columns then you may be able to do some other processing on them to raise the variance. First I would make sure that the values are being stored in a datatype that has enough precision to represent them, though that's usually handled fairly well automatically by MATLAB.
If the mean values of these low-variance columns are enough orders of magnitude different from the other columns (note that the above test is relative to the eps
of the maximum of all the columns' variances) then that will give rise to a relative disparity which you might be able to reduce with some judicious normalisation.
And if all that fails then maybe you have to go back to the acquisition source and improve the SNR. If that's an MRI machine then I wish you the best of luck...