I have some data I that I'm assuming comes from a distribution and I'm trying to estimate that distribution.
Right now I'm using the package KernSmooth in R with a Gaussian kernel and am using the package's dpik()
function to automatically select my bandwidth. (I assume it uses AMISE or the sort, please let me know if there is a better auto-bandwidth selection process)
What I'm interested in, though, is finding the x-value that corresponds with the highest peak in the distribution...This seems like a very simple thing to me and something I put off as trivial earlier on but to my frustration, I'm hitting some snags.
The bkde()
function in KernSmooth passes back a set of (x,y) coordinates which map out the distribution the algorithm has estimated. I know I could simply do a linear search through the data to find the max y-value and could simply grab the corresponding x-value but, as I am writing a function which may be called frequently in an automated process, I feel it is inefficient. Especially inefficient since bkde()
gives back a lot of values.
My other idea was to attempt to fit a curve to it and take the derivative and set it equal to zero but that sounds like it may be inefficient as well.
Maybe density()
would be a better function to use here?
Please let me know if there is any efficient way for this...I actually plan to do a little bit of inference on the distributions I find. Such as finding the cutoff points to chop off a certain percentage of the tail on either side (i.e. confidence intervals) and finding the expected value. My vague plan now is to use some monte carlo techniques or attempt to draw from the distribution to get an idea for areas with bootstrapping techniques. Any help on any methods to do any of these would be greatly appreciated.