5

I have an image with multivariate Gaussian distribution in histogram. I want to segment the image to two regions so that they both can follow the normal distribution like the red and blue curves shows in histogram. I know Gaussian mixture model potentially works for that. I tried to use fitgmdist function and then clustering the two parts but still not work well. Any suggestion will be appreciated. enter image description hereenter image description here enter image description here enter image description here

Below is the Matlab code for my appraoch.

% Read Image
I = imread('demo.png');
I = rgb2gray(I);
data = I(:);

% Fit a gaussian mixture model
obj = fitgmdist(data,2);
idx = cluster(obj,data);
cluster1 = data(idx == 1,:);
cluster2 = data(idx == 2,:);

% Display Histogram
histogram(cluster1)
histogram(cluster2)
SimaGuanxing
  • 673
  • 2
  • 10
  • 29

2 Answers2

5

Your solution is correct

The way you are displaying your histogram poorly represents the detected distributions.

  1. Normalize the bin sizes because histogram is a frequency count
  2. Make the axes limits consistent (or plot on same axis)

These two small changes show that you're actually getting a pretty good distribution fit.

histogram(cluster1,0:.01:1); hold on;
histogram(cluster2,0:.01:1);

Hists

Re-fit a gaussian-curve to each cluster

Once you have your clusters if you treat them as independent distributions, you can smooth the tails where the two distributions merge.

gcluster1 = fitdist(cluster1,'Normal');
gcluster2 = fitdist(cluster2,'Normal');

x_values = 0:.01:1;
y1 = pdf(gcluster1,x_values);
y2 = pdf(gcluster2,x_values);
plot(x_values,y1);hold on;
plot(x_values,y2);

Gaussian

Community
  • 1
  • 1
Brendan Frick
  • 1,047
  • 6
  • 19
  • Thanks so much for your help Brendan. I just have one more question here. I know very little about fitdist function in Matlab, My question is If the data itself is skewed, can we still fit the normal distribution on that data? I mean, is that fair enough to use fitdist at this point? – SimaGuanxing Jun 21 '17 at 18:31
  • Also, for the pdf plot, what is y axis unit (max 10) mean? I was processing another data but found the ratio of the two peaks on the pdf plot is differ than the original histogram plot. Is that any way we can convert the pdf plot to the histogram plot again where y is the pixel count frequency? Thanks! – SimaGuanxing Jun 21 '17 at 19:24
  • The problem is that `fitgmdist()` computes a very narrow distribution to be more robust (to the point where there is no overlap). When we extend the range to include distribution overlap we are undoubtedly including skew from the non-normal convergence. However, if we were able to identify Gaussian distribution's originally with `fitgmdist(),` we can assume that each distribution is robust enough to be re-identified semi-accurately even with tail noise. You can always check the means in the `fitgmdist()` distribution and `fitdist()` and see if the results are the same. – Brendan Frick Jun 21 '17 at 19:33
  • I think you could normalize (`y1 = y1./sum(y1)`) and scale (`y1 = y1.*numel(I(:))`) – Brendan Frick Jun 21 '17 at 19:37
  • Thanks Brendan! Sorry, I may be very slow to understand this but the peak value (frequency) is still differ. Taking this as an example, if I use imhist(I(:)), the peak is around 900 and 345 but on the pdf it will be 9000 and 2000. Let me know. Thanks again! – SimaGuanxing Jun 21 '17 at 19:54
  • What I mean is, if I can convert the pdf to the same figure which imhist gives, I can compare the results easily. For imhist, the bin should be 256 but for pdf, I don't know how to set to exactly the same bin as imhist use. – SimaGuanxing Jun 21 '17 at 20:05
-2

How are you trying to use this 'model'? If the data is constant, then why dont you measure, the mean/variances for the two gaussians seperately?

And if you are trying to generate new values from this mixed distribution, then you can look into a mixture model with weights given to each of the above distributions.

kri
  • 83
  • 2
  • 6
  • Because the two Gaussian has mixed tegther so I cannot directly measure the mean/std for them separately. That's why I need to separate the curve~ – SimaGuanxing Jun 21 '17 at 17:58