0

Image first: enter image description here

As you can see, I have a set of normal distributions. Made for a presentation purpose, there was a feedback that those plots should be "normalized", for what I understood having the plot values set to some maximum value and the curvature of the plot set to fit this anyways. The goal is to make those plots easier to compare visually - I hope this makes sense. I'm using histfit for this plot.

Do you know of some method how can I possible make those graphs more comparable in that way?

EDIT: As it was marked as a duplicate, it's not really the case. I'm not really looking for the probability density to fit the histogram, I want to set a maximum value for each probability density curve. I know about the market topics, I just couldn't find my anwser there.

EDIT2:

Those are exercpts from my code with the solutions they produce:

[f1,x1] = hist(data1);
[f2,x2] = hist(data2);
[f3,x3] = hist(data3);

avg = mean(data1);
stdev = std(data1);
VERT1 = sort(data1);
y1 = exp(- 0.5 * ((VERT1 - avg) / stdev) .^ 2) / (stdev * sqrt(2 * pi));
y11 = y1/max(data1);


avg = mean(data2);
stdev = std(data2);
VERT2 = sort(data2);
y2 = exp(- 0.5 * ((VERT2 - avg) / stdev) .^ 2) / (stdev * sqrt(2 * pi));
y22 = y2/max(data2);

% 
avg = mean(data3);
stdev = std(data3);
VERT3 = sort(data3);
y3 = exp(- 0.5 * ((VERT3 - avg) / stdev) .^ 2) / (stdev * sqrt(2 * pi));
y33 = y3/max(data3);

enter image description here

Direct link for clarity: https://i.stack.imgur.com/GBYIz.jpg

From this explanation I get this:

[f1,x1] = hist(data1);
[f2,x2] = hist(data2);
[f3,x3] = hist(data3);


avg = mean(data1);
stdev = std(data1);
VERT1 = sort(data1);
y1 = exp(- 0.5 * ((VERT1 - avg) / stdev) .^ 2) / (stdev * sqrt(2 * pi));



avg = mean(data2);
stdev = std(data2);
VERT2 = sort(data2);
y2 = exp(- 0.5 * ((VERT2 - avg) / stdev) .^ 2) / (stdev * sqrt(2 * pi));



avg = mean(data3);
stdev = std(data3);
VERT3 = sort(data3);
y3 = exp(- 0.5 * ((VERT3 - avg) / stdev) .^ 2) / (stdev * sqrt(2 * pi));


h1 = bar(x1,f1/trapz(x1,f1));hold on;
h2 = bar(x2,f2/trapz(x2,f2),'r');hold on;
h3 = bar(x3,f3/trapz(x3,f3),'g');hold on;
plot(VERT1,y1,'b-');hold on;
plot(VERT2,y2,'r-');hold on;
plot(VERT3,y3,'g-');hold off;

Which results in: enter image description here

Hope this explains everything.

Community
  • 1
  • 1
  • 2
    You can divide each histogrm by its maximum value... – Ander Biguri Nov 25 '14 at 15:15
  • Why can't you just scale them? [normalisation of audio signal and reverting to original matlab](http://stackoverflow.com/q/22894559/2545927) might give you some hints. – kkuilla Nov 25 '14 at 15:16
  • 5
    To normalize a histogram (or pdf) you to make sure its _area_ is 1. See for example [here](http://stackoverflow.com/questions/5320677/how-to-normalize-a-histogram-in-matlab) – Luis Mendo Nov 25 '14 at 15:19
  • @kkuilla when I try to do it (h1(2) = h1(2)/max(data)) I get an error that MATLAB doesn't recognize the object. I think it changes the type? – Krystian Meresiński Nov 26 '14 at 11:16
  • This question seems to have an answer already. Is there any reason why that answer is not helpful to you? – kkuilla Nov 26 '14 at 11:23
  • Yes. [this](http://stackoverflow.com/questions/5320677/how-to-normalize-a-histogram-in-matlab) anwser gives me [this result](http://imgur.com/CGFK25Y), which is not what I'm looking for. I'm looking for a way to flatten the plots to one maximum value, but method by @Ander doesn't work, that's the [pdf scaled down by max value](http://imgur.com/sdTMNmb) – Krystian Meresiński Nov 26 '14 at 11:47
  • 1
    @KrystianMeresiński: Maybe you should show us what you want instead of letting us guess and fail... How about a little freehand before/after drawing of the curve you're looking for? – Jean-François Corbett Nov 26 '14 at 11:55
  • @KrystianMeresiński Im pretty sure the plot you linked as "my answer" is not what I suggested. – Ander Biguri Nov 26 '14 at 12:19
  • @Jean-FrançoisCorbett I was imagining sth like [this](http://imgur.com/vdNrX7J,9zdjF19#0) - the distribution changed in a way that the plots have maximum value. Again, I'm not even sure if this makes any sense, but it seems that scaling the histograms would do the trick, but it doesn't work - as presented in my previous comment. – Krystian Meresiński Nov 26 '14 at 12:23
  • 1
    You may not realize it, but this *is* indeed a duplicate of [that other question](http://stackoverflow.com/questions/5320677/how-to-normalize-a-histogram-in-matlab); the same answer applies here. As @LuisMendo wrote above, the areas of all probability density curves should be equal to 1. You should heed the wise advice that has been given you. Don't start messing with the width or mean of your distributions to make them "look nicer". – Jean-François Corbett Nov 26 '14 at 12:45

1 Answers1

2

What you have are two plots with non-zero means and non-unit standard deviations. Such distributions are hard to compare. What normalization means in this context (as far I think) is to make the mean of the fitted bell curve 0 and standard deviation 1. This can be achieved quite simply. Here is a toy example:

clf;
data1 = random('normal',300,30,100,1); %Randomly generated first dataset
data2 = random('normal',250,10,100,1); %Randomly generated second dataset
h1=histfit(data1); %Plot the data
hold on;
h2=histfit(data2);
delete(h1(1));
delete(h2(1));
set(h2(2),'color','b')

This yields:

enter image description here

To normalize, simply replace the data that you're fitting with normalized data as:

h1=histfit( (data1-mean(data1)) / std(data1) );
h2=histfit( (data2-mean(data2)) / std(data2) );

To yield: enter image description here

making the comparison of the graphs much cleaner.

Nitish
  • 6,358
  • 1
  • 15
  • 15
  • I got this in result: http://imgur.com/ZYYtknp - this is clearer, so thank you. This is, though, not the normalization I was looking for. – Krystian Meresiński Nov 26 '14 at 11:08
  • So you've removed the distributions' mean and variance... What is the meaning of the result?! Most of the information is gone! – Jean-François Corbett Nov 26 '14 at 12:47
  • @Jean-FrançoisCorbett, there are many reasons why this might be useful. Suppose I have a process which I have a hunch is Gaussian but don't know. By putting it in standard normal form, I am in a much better position to ascertain this graphically (of course, the right way would be through QQ plots). In my opinion, things like kurtosis and skewness are much easier to see graphically in this format. Suppose I have data for number of coke bottles and diet coke bottles consumed in a day, if I normalize both, I'd probably be in a better situation to assess the tendencies around the mean. – Nitish Nov 26 '14 at 16:37