0

Taking the ideas from the following links:

  1. the local minimum between the two peaks
  2. How to explain ...

I look for the local minimum or minimums, avoiding the use of functions already created for this purpose [max / min locale or global]. Our progress:

#DATA
simulate <- function(lambda=0.3, mu=c(0, 4), sd=c(1, 1), n.obs=10^5) {
  x1 <- rnorm(n.obs, mu[1], sd[1])
  x2 <- rnorm(n.obs, mu[2], sd[2])    
  return(ifelse(runif(n.obs) < lambda, x1, x2))
}

data <- simulate()
hist(data)
d <- density(data)
#
#https://stackoverflow.com/a/25276661/8409550
##Since the x-values are equally spaced, we can estimate dy using diff(d$y)
d$x[which.min(abs(diff(d$y)))] 
#With our data we did not obtain the expected value
#
d$x[which(diff(sign(diff(d$y)))>0)+1]#pit

d$x[which(diff(sign(diff(d$y)))<0)+1]#peak
 
#we check
#1
optimize(approxfun(d$x,d$y),interval=c(0,4))$minimum 
optimize(approxfun(d$x,d$y),interval=c(0,4),maximum = TRUE)$maximum 
#2
tp <- pastecs::turnpoints(d$y)
summary(tp)
ind <- (1:length(d$y))[extract(tp, no.tp = FALSE, peak = TRUE, pit = TRUE)]
d$x[ind[2]]
d$x[ind[1]]
d$x[ind[3]]

My questions and request for help:

  1. Why did the command lines fail: d$x[which.min(abs(diff(d$y)))]
  2. It is possible to eliminate the need to add one to the index in the command lines:
    d$x[which(diff(sign(diff(d$y)))>0)+1]#pit
    d$x[which(diff(sign(diff(d$y)))<0)+1]#peak
    
  3. How to get the optimize function to return the two expected maximum values?
HerClau
  • 161
  • 2
  • 15

1 Answers1

1

Question 1

The answer to the first question is straighforward. The line d$x[which.min(abs(diff(d$y)))] asks for the x value at which there was the smallest change in y between two consecutive points. The answer is that this happened at the extreme right of the plot where the density curve is essentially flat:

which.min(abs(diff(d$y)))
#> [1] 511

length(abs(diff(d$y)))
#> [1] 511

This is not only smaller than the difference at your local maxima /minima points; it is orders of magnitude smaller. Let's zoom in to the peak value of d$y, including only the peak and the point on each side:

which.max(d$y)
#> [1] 324

plot(d$x[323:325], d$y[323:325])

enter image description here

We can see that the smallest difference is around 0.00005, or 5^-5, between two consecutive points. Now look at the end of the plot where it is flattest:

 plot(d$x[510:512], d$y[510:512])

enter image description here

The difference is about 1^-7, which is why this is the flattest point.



Question 2

The answer to your second question is "no, not really". You are taking a double diff, which is two elements shorter than x, and if x is n elements long, a double diff will correspond to elements 2 to (n - 1) in x. You can remove the +1 from the index, but you will have an off-by-one error if you do that. If you really wanted to, you could concatenate dummy zeros at each stage of the diff, like this:

d$x[which(c(0, diff(sign(diff(c(d$y, 0))))) > 0)]

which gives the same result, but this is longer, harder to read and harder to justify, so why would you?



Question 3

The answer to the third question is that you could use the "pit" as the dividing point between the minimum and maximum value of d$x to find the two "peaks". If you really want a single call to get both at once, you could do it inside an sapply:

pit <- optimize(approxfun(d$x,d$y),interval=c(0,4))$minimum 

peaks <- sapply(1:2, function(i) {
                       optimize(approxfun(d$x, d$y),
                                interval = c(min(d$x), pit, max(d$x))[i:(i + 1)],
                                maximum = TRUE)$maximum
         })

pit
#> [1] 1.691798

peaks
#> [1] -0.02249845  3.99552521
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87