3

Suppose I have

set.seed(2020)    # make the results reproducible
a <- rnorm(100, 0, 1)

My probability density is estimated through kernel density estimator (gaussian) in R using the R built in function density. The question is how to integrate the square of the estimated density. It does not matter between which values, let us suppose between -Inf and +Inf. I have tried the following:

f <- approxfun(density(a)$x, density(a)$y)
integrate (f*f, min(density(a)$x), max(density(a)$x))
Z B
  • 131
  • 8

3 Answers3

3

There are a couple of problems here. First you have the x and y round the wrong way in approxfun. Secondly, you can't multiply function names together. You need to specify a new function that gives you the square of your original function:

set.seed(2020)   
a  <- rnorm(100, 0, 1)
f  <- approxfun(density(a)$x, density(a)$y)
f2 <- function(v) ifelse(is.na(f(v)), 0, f(v)^2)

integrate (f2, -Inf, Inf)
#> 0.2591153 with absolute error < 0.00011

We can also plot the original density function and the squared density function:

curve(f, -3, 3)
curve(f2, -3, 3, add = TRUE, col = "red")

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • when i try it with -Inf and Inf . I got problems – Z B Sep 22 '20 at 14:59
  • even when i use min and max i got maximum number of subdiviosns reached – Z B Sep 22 '20 at 15:04
  • @ZB it was returning `NA` outside the range of the density. See my update. It now works with `-Inf` and `Inf` – Allan Cameron Sep 22 '20 at 15:08
  • in fact i got Na yes that is True but even whem i used min and max I got maximum number of subdivisions reached . I forgot to say to you that i was using the whole code above inside a loop – Z B Sep 22 '20 at 15:18
  • @ZB I don't think anyone here can replicate this fault. Perhaps it is something to do with the loop? – Allan Cameron Sep 22 '20 at 15:23
  • above is code i have used to calculate the size of tests – Z B Sep 22 '20 at 15:38
  • i have tried the first code with -inf and inf and my computer was booting for ever i think it was NAs then i tried it wit min and max so i got maximum number of subdivisons reached Finally i tried the last changement you have done so i got roundoff was detected – Z B Sep 22 '20 at 15:44
  • I think i find out the secret !!! for 10000 loops the function do not work well . I think it is limited for 1000 – Z B Sep 22 '20 at 17:33
2

Here is a way using package caTools, function trapz. It computes the integral given a vector x and its corresponding image y using the trapezoidal rule.
I also include a function trapzf based on the original to have the integral computed with the function returned by approxfun

trapzf <- function(x, FUN) trapz(x, FUN(x))

set.seed(2020)    # make the results reproducible
a <- rnorm(100, 0, 1)

d <- density(a)
f <- approxfun(d$x, d$y)

int1 <- trapz(d$x, d$y^2)
int2 <- trapzf(d$x, function(x) f(x)^2)

int1
#[1] 0.2591226

identical(int1, int2)
#[1] TRUE
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Nice Rui. What do you think accounts for the small numerical difference in our final answers? – Allan Cameron Sep 22 '20 at 14:10
  • 2
    Function `trapz` uses the trapezoidal rule, I will edit with a link to the Wikipedia page. `integrate` uses adaptive quadrature, the result should not be the same. – Rui Barradas Sep 22 '20 at 14:13
  • @RuiBarradas superrrrrrrrrrr i will try it and then give you feedback – Z B Sep 22 '20 at 14:16
2

I think you should write the objective function as function(x) f(x)**2, rather than f*f, e.g.,

> integrate (function(x) f(x)**2, min(density(a)$x), max(density(a)$x))
0.2331793 with absolute error < 6.6e-06
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • @Allan Cameron Superrrrrrrrrrrrrrrrrrrr!!! i will try it and then give you feedback – Z B Sep 22 '20 at 14:15
  • I cannot get the same result, I'm getting [Allan's](https://stackoverflow.com/a/64011444/8245406) result. Maybe this is because I have set the RNG seed, unlike the OP. – Rui Barradas Sep 22 '20 at 14:24
  • @RuiBarradas I have no clue what happened. Yours and Allan's code also give me the result I achieved.... – ThomasIsCoding Sep 22 '20 at 21:17