4

I am using Gaussian kernel to estimate a pdf of a data based on the equation enter image description here where K(.) is Gaussian kernel, data is a given vector. z is bin from 1 to 256. size of bin is 1.

I implemented by matlab code. However, the result show the amplitude of my pdf estimation (blue color) is not similar with real pdf of data. Could you see my code and give me some comment about my code?

MATLAB CODE

function pdf_est=KDE()
close all;
%%Random values of 20 pixels, range=[1 256]
data=randi([1 256],1,20);

%% Estimate histogram%%%%% 
pdf_est=zeros(1,256);
z=256;

for i=1:z
    for j=1:length(data)
        pdf_est(i)=pdf_est(i)+Gaussian(i-data(j));
    end
end
%% Plot real histogram 1 to 256; binsize=1;
hold on
plot(imhist(uint8(data))./length(data),'r');
%% Plot histogram estimation
plot(pdf_est./length(data),'b');
hold off
function K=Gaussian(x)
   sigma=1;
   K=1./(sqrt(2*pi)*sigma)*exp(-x^2./(2*sigma^2));

RESULT BLUE is my result and RED is real pdf enter image description here

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
user3051460
  • 1,455
  • 22
  • 58

1 Answers1

3

You have two issues:

  1. A 1-unit displacement between blue and red plots.
  2. The blue spikes are wider and less tall than the red ones.

How to solve each issue:

  1. This is caused by a possible confusion between the data range 0,...,255 and the indexing interval 1,...,256. Since your data represents an 8-bit image, values should be 0,...,255 (not 1,...,256). Your plotted horizontal axis should then be 0,...,255. Same goes for the i variable in the for line. And then, since Matlab indexing starts at 1, you should use i+1 when indexing pdf_est.

  2. This is normal behaviour. You are assuming unit variance in your kernel. To see taller blue spikes you could reduce sigma to make the kernel narrower and taller. But you will never get the exact same height as your data (the necessary sigma would depend on your data).

    Actually, you have a tradeoff between height and width, controlled by sigma. But the important thing is that the area remains the same for any sigma. So I suggest plotting the CDF (area) instead of the pdf (area densisty). To do that, plot the accumulated histogram and pdf (using cumsum).

Code modified according to 1:

function pdf_est=KDE()
close all;
%%Random values of 20 pixels, range=[1 256]
data=randi([1 256],1,20)-1; %// changed: "-1"

%% Estimate histogram%%%%% 
pdf_est=zeros(1,256);
z=256;

for i=0:z-1 %// changed_ subtracted 1 
    for j=1:length(data)
        pdf_est(i+1)=pdf_est(i+1)+Gaussian(i-data(j)); %// changed: "+1" (twice)
    end
end
%% Plot real histogram 1 to 256; binsize=1;
hold on
plot(0:255, imhist(uint8(data))./length(data),'r'); %// changed: explicit x axis
%% Plot histogram estimation
plot(0:255, pdf_est./length(data),'b'); %// changed: explicit x axis
hold off
function K=Gaussian(x)
   sigma=1; %// change? Set as desired
   K=1./(sqrt(2*pi)*sigma)*exp(-x^2./(2*sigma^2));

enter image description here

Code modified according to 1 and 2:

function pdf_est=KDE()
close all;
%%Random values of 20 pixels, range=[1 256]
data=randi([1 256],1,20)-1; %// changed: "-1"

%% Estimate histogram%%%%% 
pdf_est=zeros(1,256);
z=256;

for i=0:z-1 %// changed: subtracted 1 
    for j=1:length(data)
        pdf_est(i+1)=pdf_est(i+1)+Gaussian(i-data(j)); %// changed: "+1" (twice)
    end
end
%% Plot real histogram 1 to 256; binsize=1;
hold on
plot(0:255, cumsum(imhist(uint8(data))./length(data)),'r'); %// changed: explicit x axis
                                                            %// changed: cumsum
%% Plot histogram estimation
plot(0:255, cumsum(pdf_est./length(data)),'b'); %// changed: explicit x axis
                                                %// changed: cumsum
hold off
function K=Gaussian(x)
   sigma=1; %// change? Set as desired
   K=1./(sqrt(2*pi)*sigma)*exp(-x^2./(2*sigma^2));

enter image description here

Community
  • 1
  • 1
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • @user3051460 Glad I could help with that! It was an interesting question – Luis Mendo Feb 16 '15 at 13:17
  • Yes. Actually, we have a function that similar idea in http://kr.mathworks.com/help/stats/ksdensity.html;jsessionid=1f5f20f4d8420e7ec67e6dc2a7a7. However, in my case, the above formula is more simple and suitable for image segmentation. But when I run above code, it takes long time (because we have two loop). Could you see my formula and tried optimize for reduce computation time? – user3051460 Feb 16 '15 at 13:58
  • @user3051460 I think some of the loops (maybe both) can be eliminated by vectorization. But it would be confusing to do that here, as that's not what was being asked in your original question. I suggest you create a separate question for that. Choose the modified code version you want to use, post it, and ask for it to be vectorized. I think it can be easily done. Someone (perhaps myself if I have time) will surely help with that. – Luis Mendo Feb 16 '15 at 15:32
  • Yes. I posted it at http://stackoverflow.com/questions/28545361/optimize-computation-time-for-pdf-approximation-based-on-kernel-density-estimati – user3051460 Feb 16 '15 at 15:56