0

I have a discrete data set with multiple peaks. I am trying to generate an automatic method for fitting a Gaussian curve to an unknown number of data points. The ultimate goal is to provide a measure of uncertainty on the location (x-axis) of the peak in the y-axis, using the sigma value of a best-fit Gaussian curve. The full data set has a half dozen or so unique peaks of various shapes.

Here is a sample data set.

working <- data.frame(age = seq(1, 50), likelihood = c())
likelihood = c(10, 10, 10, 10, 10, 12, 14, 16, 17, 18, 
           19, 20, 19, 18, 17, 16, 14, 12, 11, 10,
           10, 9, 8, 8, 8, 8, 7, 6, 6, 6))

Here is the Gaussian fitting procedure. I found it on SO, but I can't find the page I took it from again, so please forgive the lack of link and citation.

fitG =
function(x,y,mu,sig,scale)
f = function(p){
  d = p[3] * dnorm( x, mean = p[ 1 ], sd = p[ 2 ] )
  sum( ( d - y ) ^ 2)
  }
optim( c( mu, sig, scale ), f )
}

This works well if I pre-define the area to fit. For instance taking only the area around the peak and using input mean = 10, sigma = 5, and scale = 1:

work2       <- work[5:20, ]
fit1        <- fitG(work2$age, work2$likelihood, 10, 5, 1)
fitpar1     <- fit1$par
plot(work2$age, work2$likelihood, pch = 20)
lines(work2$age, fitpar1[3]*dnorm(work2$age, fitpar1[1], fitpar1[2]))

However, I am interested in automating the procedure in some way, where I define the peak centers for the whole data set using peakwindow from the cardidates package. The ideal function would then iterate the number of data points used in the fit around a given peak in order to optimize the Gaussian parameters. Here is my attempt:

fitG.2 <- function (x, y) {
  g <- function (z) {
    newdata <- x[(y - 1 - z) : (y + 1 + z), ]
    newfit  <- fitG( newdata$age, newdata$likelihood, 10, 5, 1)
  }
  optimize( f = g, interval = c(seq(1, 100)))
}

However, I can't get this type of function to actually work (an error I can't solve). I have also tried creating a function with a for loop and setting break parameters but this method does not work consistently for peaks with widely varying shape parameters. There are likely many other R functions unknown to me that do exactly this.

Jesse
  • 244
  • 2
  • 15
  • At it's core this seems to be a request for "2d peak finding" in R and an SO search brings up this Q&A that appears to address that satisfactorily. http://stackoverflow.com/questions/11059104/given-a-2d-numeric-height-map-matrix-in-r-how-can-i-find-all-local-maxima . If you disagree you should explain in more detail. Your dataset does not appear to be a "2-d" problem. – IRTFM Oct 15 '15 at 18:27
  • You are right, I don't mean 2D. Thank you for pointing that error out. – Jesse Oct 15 '15 at 18:36

0 Answers0