I got a very unlikely, but a very dangerous numerical error while integrating thousands of sufficiently well-behaved functions in R using the built-in integrate
function.
Story (can be skipped). My problem is connected with maximum likelihood and is based on a highly non-linear function (of 10–20 parameters) for which the analytical expression does not exist, so it requires computing thousands of integrals for one evaluation. I have produced the MWE that contained this error. For the optimisation of this function, due to multiple local optima, I am trying 1000 points for 1000 iterations (with derivative-free methods like particle swarm from hydroPSO
and differential evolution from DEoptim
), so just for one model, I have to compute more than a billion integrals (!), and there are 200 candidate models, each of which requires later hot-start reëstimation, so the total number of integrals is way over trillion. I would like to find the fastest solution that gives sufficient accuracy.
The function is a product of two density functions (gamma or similar) times some positive expression, and the joint density is being computed according to the formula f_{X+Y}(z) = int_{supp Y} f_{X+Y}(z-y, y) dy
. I cannot use convolution because X
and Y
are not independent in the general case. The support of Y
in my case is (-Inf, 0]
. The scale parameter of the distribution is very small (the model is GARCH-like), so very often, the standard integration routine would fail to integrate a function that is non-zero on a very small section of the negative line (like [-0.02, -0.01]
where it takes huge values and 0
everywhere else where it is trying hard to compute the quadrature), and R’s integrate
would often return the machine epsilon because is could not find points in that range where the function took values much greater than zero. In order to combat this problem, I stretch the function around the zero by an inverse of the scale parameter, compute the integral, and then divide it by the scale, i. e. integrate(f(x/scale)/scale$value)
. However, sometimes, this re-scaling also failed, so I implemented a safety check to see if the value of the scaled function is suspiciously low (i. e. <1e-8
), and then recompute the integral. The rescaled function was working like a charm, returning nice values where the non-scaled one failed, and in the rare cases the rescaled function returned a machine epsilon, the non-rescaled one worked.
Until today when integration of the rescaled function suddenly yielded a value of 1.5 instead of 3.5. Of course the function passed the safety check (because this is a plausible value, not machine epsilon, and some other values were less than this, so it was in the common range). It turned out, roughly in 0.1% of all cases, integrate
under-estimated the function. The MWE is below.
First, we define the function of x
and an optional parameter numstab
that defines the scaling.
cons <- -0.020374721416129591
sc <- 0.00271245601724757383
sh <- 5.704
f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh, scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab
Next, we plot it to make sure that the scaling works correctly.
curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")
And then we check this integral by summation:
sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 # True value, 3.575294
sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
str(integrate(f, -Inf, 0)) # Gives 3.575294
# $ value : num 3.58
# $ abs.error : num 1.71e-06
# $ subdivisions: int 10
str(integrate(f, -Inf, 0, numstab = sc))
# $ value : num 1.5 # WTF?!
# $ abs.error : num 0.000145 # WTF?!
# $ subdivisions: int 2
It stop at just two subdivisions! Now, in order to see what is going on during the integration, we create a global object and update it every time the integration routine does something.
global.eval.f <- list()
f.trace <- function(x, numstab = 1) {
this.f <- f(x, numstab)
global.eval.f[[length(global.eval.f) + 1]] <<- list(x = x, f = this.f)
return(this.f)
}
integrate(f.trace, -Inf, 0)
Now, we visualise this integration process.
library(animation)
l <- length(global.eval.f)
mycols <- rainbow(l, end = 0.72, v = 0.8)
saveGIF({
for (i in 1:l) {
par(mar = c(4, 4, 2, 0.3))
plot(xgrid <- seq(-0.1, -0.01, length.out = 301), f(xgrid), type = "l", bty = "n", xlab = "x", ylab = "f(x)", main = "Function without stabilisation")
for (j in 1:(l2 <- length(this.x <- global.eval.f[[i]]$x))) lines(rep(this.x[j], 2), c(0, global.eval.f[[i]]$f[j]), col = mycols[i], type = "b", pch = 16, cex = 0.6)
legend("topleft", paste0("Quadrature: ", i), bty = "n")
text(rep(-0.1, l2), seq(325, 25, length.out = l2), labels = formatC(sort(this.x), format = "e", digits = 2), adj = 0, col = ifelse(sort(this.x) > -0.1 & sort(this.x) < -0.01, mycols[i], "black"), cex = 0.9)
}
}, movie.name = "stab-off-quad.gif", interval = 1 / 3, ani.width = 400, ani.height = 300)
And the same thing, but on a different scale.
global.eval.f <- list()
integrate(f.trace, -Inf, 0, numstab = sc)
l <- length(global.eval.f)
mycols <- rainbow(l, end = 0.7, v = 0.8)
saveGIF({
for (i in 1:l) {
par(mar = c(4, 4, 2, 0.3))
plot(xgrid <- seq(-0.1 / sc, -0.01 / sc, length.out = 301), f(xgrid, sc), type = "l", bty = "n", xlab = "x", ylab = "f(x)", main = "Function with stabilisation")
for (j in 1:(l2 <- length(this.x <- global.eval.f[[i]]$x))) lines(rep(this.x[j], 2), c(0, global.eval.f[[i]]$f[j]), col = mycols[i], type = "b", pch = 16, cex = 0.6)
legend("topleft", paste0("Quadrature: ", i), bty = "n")
text(rep(-0.1 / sc, l2), seq(325 * sc, 25 * sc, length.out = l2), labels = formatC(sort(this.x), format = "e", digits = 2), adj = 0, col = ifelse(sort(this.x) > -0.1 / sc & sort(this.x) < -0.01 / sc, mycols[i], "black"), cex = 0.9)
}
}, movie.name = "stab-on-quad.gif", interval = 1 / 3, ani.width = 400, ani.height = 300)
The problem is, I cannot try various stabilising multipliers for the function because I have to compute this integral a trillion times, so even in the super-computer cluster, this takes weeks. Besides that, reducing the rel.tol
just to 1e-5
helped a bit, but I am not sure whether this guarantees success (and reducing it to 1e-7
slowed down the computations in some cases). And I have looked at the Fortran code of the quadrature just to see the integration rule.
The timings can be seen below (I added an extra attempt with a lower tolerance).
How can I make sure that the integration routine will not produce such wrong results for such a function, and the integration will still be fast?