1

I have to plot 10 frequency distributions on one graph. In order to keep things tidy, I would like to avoid making a histogram with bins and would prefer having lines that follow the contour of each histogram plot.

I tried the following

[counts, bins] = hist(data);
plot(bins, counts)

But this gives me a very inexact and jagged line.

I read about ksdensity, which gives me a nice curve, but it changes the scaling of my y-axis and I need to be able to read the frequencies from the y-axis.

Can you recommend anything else?

Kaly
  • 3,289
  • 4
  • 24
  • 25

2 Answers2

3

You're using the default number of bins for your histogram and, I will assume, for your kernel density estimation calculations.

Depending on how many data points you have, that will certainly not be optimal, as you've discovered. The first thing to try is to calculate the optimum bin width to give the smoothest curve while simultaneously preserving the underlying PDF as best as possible. (see also here, here, and here);

If you still don't like how smooth the resulting plot is, you could try using the bins output from hist as a further input to ksdensity. Perhaps something like this:

[kcounts,kbins] = ksdensity(data,bins,'npoints',length(bins));

I don't have your data, so you may have to play with the parameters a bit to get exactly what you want.

Alternatively, you could try fitting a spline through the points that you get from hist and plotting that instead.

Some code:

data = randn(1,1e4);

optN = sshist(data);

figure(1)
[N,Center] = hist(data);
[Nopt,CenterOpt] = hist(data,optN);
[f,xi] = ksdensity(data,CenterOpt);

dN = mode(diff(Center));
dNopt = mode(diff(CenterOpt));

plot(Center,N/dN,'.-',CenterOpt,Nopt/dNopt,'.-',xi,f*length(data),'.-')
legend('Default','Optimum','ksdensity')

The result:

Different styles of histogram

Note that the "optimum" bin width preserves some of the fine structure of the distribution (I had to run this a couple times to get the spikes) while the ksdensity gives a smooth curve. Depending on what you're looking for in your data, that may be either good or bad.

Community
  • 1
  • 1
craigim
  • 3,884
  • 1
  • 22
  • 42
  • Note that the y axis scaling is always going to depend on the bin width. `ksdensity` will return a curve that is normalized to an area of 1, so you could rescale it by multiplying by `length(data)` such that the y-axis is instead proportional to the number of points. – craigim May 26 '14 at 21:27
  • how exactly do you rescale by multiplying by `length(data)`? – cosmictypist Aug 12 '15 at 14:05
  • I'm not sure I understand your question. In the example code above, I did the multiplication in the second to last line: `f*length(data)` – craigim Aug 12 '15 at 15:47
  • Also notice that both histograms are normalized by the bin width to turn them into a number per bin width frequency, so that the integral of each histogram curve equals the total number of objects. Multiplying the `kdensity` by the total number of objects makes it's integral match that of the others. – craigim Aug 12 '15 at 16:36
2

How about interpolating with splines?

nbins = 10; %// number of bins for original histogram
n_interp = 500; %// number of values for interpolation
[counts, bins] = hist(data, nbins);
bins_interp = linspace(bins(1), bins(end), n_interp);
counts_interp = interp1(bins, counts, bins_interp, 'spline');
plot(bins, counts) %// original histogram
figure
plot(bins_interp, counts_interp) %// interpolated histogram

Example: let

data = randn(1,1e4);

Original histogram:

enter image description here

Interpolated:

enter image description here

Following your code, the y axis in the above figures gives the count, not the probability density. To get probability density you need to normalize:

normalization = 1/(bins(2)-bins(1))/sum(counts);
plot(bins, counts*normalization) %// original histogram
plot(bins_interp, counts_interp*normalization) %// interpolated histogram

Check: total area should be approximately 1:

>> trapz(bins_interp, counts_interp*normalization)
ans =
    1.0009
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147