So I've got a data set that I want to parameterise but it is not a Gaussian distribution so I can't parameterise it in terms of it's mean and standard deviation. I want to fit a distribution function with a set of parameters and extract the values of the parameters (eg. a and b) that give the best fit. I want to do this exactly the same as the
lm(y~f(x;a,b))
except that I don't have a y, I have a distribution of different x values.
Here's an example. If I assume that the data follows a Gumbel, double exponential, distribution
f(x;u,b) = 1/b exp-(z + exp-(z)) [where z = (x-u)/b]:
#library(QRM)
#library(ggplot2)
rg <- rGumbel(1000) #default parameters are 0 and 1 for u and b
#then plot it's distribution
qplot(rg)
#should give a nice skewed distribution
If I assume that I don't know the distribution parameters and I want to perform a best fit of the probability density function to the observed frequency data, how do I go about showing that the best fit is (in this test case), u = 0 and b = 1?
I don't want code that simply maps the function onto the plot graphically, although that would be a nice aside. I want a method that I can repeatedly use to extract variables from the function to compare to others. GGPlot / qplot was used as it quickly shows the distribution for anyone wanting to test the code. I prefer to use it but I can use other packages if they are easier.
Note: This seems to me like a really obvious thing to have been asked before but I can't find one that relates to histogram data (which again seems strange) so if there's another tutorial I'd really like to see it.