I have a discrete data set with multiple peaks. I am trying to generate an automatic method for fitting a Gaussian curve to an unknown number of data points. The ultimate goal is to provide a measure of uncertainty on the location (x-axis) of the peak in the y-axis, using the sigma value of a best-fit Gaussian curve. The full data set has a half dozen or so unique peaks of various shapes.
Here is a sample data set.
working <- data.frame(age = seq(1, 50), likelihood = c())
likelihood = c(10, 10, 10, 10, 10, 12, 14, 16, 17, 18,
19, 20, 19, 18, 17, 16, 14, 12, 11, 10,
10, 9, 8, 8, 8, 8, 7, 6, 6, 6))
Here is the Gaussian fitting procedure. I found it on SO, but I can't find the page I took it from again, so please forgive the lack of link and citation.
fitG =
function(x,y,mu,sig,scale)
f = function(p){
d = p[3] * dnorm( x, mean = p[ 1 ], sd = p[ 2 ] )
sum( ( d - y ) ^ 2)
}
optim( c( mu, sig, scale ), f )
}
This works well if I pre-define the area to fit. For instance taking only the area around the peak and using input mean
= 10, sigma
= 5, and scale
= 1:
work2 <- work[5:20, ]
fit1 <- fitG(work2$age, work2$likelihood, 10, 5, 1)
fitpar1 <- fit1$par
plot(work2$age, work2$likelihood, pch = 20)
lines(work2$age, fitpar1[3]*dnorm(work2$age, fitpar1[1], fitpar1[2]))
However, I am interested in automating the procedure in some way, where I define the peak centers for the whole data set using peakwindow
from the cardidates
package. The ideal function would then iterate the number of data points used in the fit around a given peak in order to optimize the Gaussian parameters. Here is my attempt:
fitG.2 <- function (x, y) {
g <- function (z) {
newdata <- x[(y - 1 - z) : (y + 1 + z), ]
newfit <- fitG( newdata$age, newdata$likelihood, 10, 5, 1)
}
optimize( f = g, interval = c(seq(1, 100)))
}
However, I can't get this type of function to actually work (an error I can't solve). I have also tried creating a function
with a for
loop and setting break
parameters but this method does not work consistently for peaks with widely varying shape parameters. There are likely many other R functions unknown to me that do exactly this.