0

I would like to plot multiple frequency histograms on one graph, however, my frequency plot is jagged and not pretty. As shown below with this code:

mmin = min([Data]);
mmax = max([Data]);
figure(1);n = hist(Data, x);
f = n/sum(n);
plot(x,f,'r','LineWidth',3)

enter image description here

To make it smooth, I looked into ksdensity and created the graph below based on this code:

[f,xi] = ksdensity(data);
figure(1)
plot(xi,f);

enter image description here

However, I noticed that my graph is no longer plotting frequency on the y-axis. Is there anyway to correct for this change using ksdensity? I really like how the graph looks as opposed to my frequency histogram and would like to keep using ksdensity, unless there is a better suggestion.

Thank you!

Data Sample: https://www.dropbox.com/s/4ax2cuvugvqxjh6/splicing_attempt2_normalized_combined.txt?dl=0

cosmictypist
  • 555
  • 1
  • 7
  • 17
  • there was a similar post yesterday, here is the link http://stackoverflow.com/questions/31944968/smoothing-the-curve , basically, the moving average computed with interpolation is the way to go (I seem to remember) – GameOfThrows Aug 12 '15 at 13:30
  • @GameofThrows I will look into this. Thank you! – cosmictypist Aug 12 '15 at 13:31
  • you are welcome, the ksdensity does normalization by default (i.e. area under the curve should equal 1). So it might be easier to adjust your plot using the method in the post (which allows you to set kernel size as well) – GameOfThrows Aug 12 '15 at 13:39
  • @christylynn002 - I see why you visited my post. The question I solved, the OP only wanted to smooth the corner points. You want to have a significantly increased factor of smoothing, so choose a window size that's quite large... something like 20... or even 30. You have to experiment to get good results. – rayryeng Aug 12 '15 at 16:16
  • @rayryeng ah, I see. Thank you for replying. – cosmictypist Aug 12 '15 at 16:23
  • @christylynn002 - No problem. Let me know how it goes! In the mean time, I'd love if you could post your data somewhere. That way, I can check to see if what I did can apply in your case. – rayryeng Aug 12 '15 at 16:25
  • @rayryeng I was actually struggling to use your method. I kept getting error messages and was in the process of trying to understand where I went wrong when I got commandeered by my boss. I'll post a link to some sample data in a couple minutes. – cosmictypist Aug 12 '15 at 16:29
  • That's fine. I can take a look and can adapt the method I wrote to your data. – rayryeng Aug 12 '15 at 16:31
  • @rayryeng I've edited my post to include a link to some of my data – cosmictypist Aug 12 '15 at 16:35

1 Answers1

2

The trick is here that I don't think you are calculating the frequency correctly in your histogram. You are neglecting the bin width. Your frequency should be the number of SNPs per position, which requires dividing by the number of (possibly fractional) positions per bin.

Try this:

Data = rand(1, 1e4);

figure(1);
[n, c] = hist(Data, 20);
dc = abs(c(2) - c(1));

f = n./(dc * sum(n));
plot(c,f,'r','LineWidth',3)

[~,f_kde,xi] = kde(Data);
line(xi,f_kde);

I don't have the Statistics Toolbox, so I'm using the File Exchange kde function instead, but both work the same way.

If the first graph is indeed what you are after, then do a little algebra-fu, and instead of dividing the histogram values by the bin width, multiply the kdensity values by the same bin width.

As I mention in my other histogram answer, there are numerous methods for choosing optimal bin widths for a histogram. I chose 20 here for expediency.

Community
  • 1
  • 1
craigim
  • 3,884
  • 1
  • 22
  • 42
  • I tried this methodology using `ksdensity` instead of `kde` and I'm still getting jagged peaks. I also created the kde function from http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator/content/kde.m, but that was still successful... – cosmictypist Aug 12 '15 at 16:56
  • Your second plot above doesn't look jagged, so if your real data (I can't access Dropbox from work so I can't see for myself) is radically different than the example, I can't help you much, but if not, I really think you are already where you need to be. The output from `kde` and `kdensity` is already a proper frequency, so there is no normalization necessary. You should be taking the output from these functions directly for comparing your different histograms. – craigim Aug 12 '15 at 18:21
  • that's what I figured. Thank you for the help. – cosmictypist Aug 12 '15 at 18:22