4

I got a very unlikely, but a very dangerous numerical error while integrating thousands of sufficiently well-behaved functions in R using the built-in integrate function.

Story (can be skipped). My problem is connected with maximum likelihood and is based on a highly non-linear function (of 10–20 parameters) for which the analytical expression does not exist, so it requires computing thousands of integrals for one evaluation. I have produced the MWE that contained this error. For the optimisation of this function, due to multiple local optima, I am trying 1000 points for 1000 iterations (with derivative-free methods like particle swarm from hydroPSO and differential evolution from DEoptim), so just for one model, I have to compute more than a billion integrals (!), and there are 200 candidate models, each of which requires later hot-start reëstimation, so the total number of integrals is way over trillion. I would like to find the fastest solution that gives sufficient accuracy.

The function is a product of two density functions (gamma or similar) times some positive expression, and the joint density is being computed according to the formula f_{X+Y}(z) = int_{supp Y} f_{X+Y}(z-y, y) dy. I cannot use convolution because X and Y are not independent in the general case. The support of Y in my case is (-Inf, 0]. The scale parameter of the distribution is very small (the model is GARCH-like), so very often, the standard integration routine would fail to integrate a function that is non-zero on a very small section of the negative line (like [-0.02, -0.01] where it takes huge values and 0 everywhere else where it is trying hard to compute the quadrature), and R’s integrate would often return the machine epsilon because is could not find points in that range where the function took values much greater than zero. In order to combat this problem, I stretch the function around the zero by an inverse of the scale parameter, compute the integral, and then divide it by the scale, i. e. integrate(f(x/scale)/scale$value). However, sometimes, this re-scaling also failed, so I implemented a safety check to see if the value of the scaled function is suspiciously low (i. e. <1e-8), and then recompute the integral. The rescaled function was working like a charm, returning nice values where the non-scaled one failed, and in the rare cases the rescaled function returned a machine epsilon, the non-rescaled one worked.

Until today when integration of the rescaled function suddenly yielded a value of 1.5 instead of 3.5. Of course the function passed the safety check (because this is a plausible value, not machine epsilon, and some other values were less than this, so it was in the common range). It turned out, roughly in 0.1% of all cases, integrate under-estimated the function. The MWE is below.

First, we define the function of x and an optional parameter numstab that defines the scaling.

cons <- -0.020374721416129591
sc <- 0.00271245601724757383
sh <- 5.704
f <- function(x, numstab = 1) dgamma(cons - x * numstab, shape = sh, scale = sc) * dgamma(-x * numstab, shape = sh, scale = sc) * numstab

Next, we plot it to make sure that the scaling works correctly.

curve(f, -0.06, 0, n = 501, main = "Unscaled f", bty = "n")
curve(f(x, sc), -0.06 / sc, 0, n = 501, main = "Scaled f", bty = "n")

Unscaled function for integrationScaled function for integration

And then we check this integral by summation:

sum(f(seq(-0.08, 0, 1e-6))) * 1e-6 # True value, 3.575294
sum(f(seq(-30, 0, 1e-4), numstab = sc)) * 1e-4 # True value, 3.575294
str(integrate(f, -Inf, 0)) # Gives 3.575294
# $ value       : num 3.58
# $ abs.error   : num 1.71e-06
# $ subdivisions: int 10
str(integrate(f, -Inf, 0, numstab = sc))
# $ value       : num 1.5 # WTF?!
# $ abs.error   : num 0.000145 # WTF?!
# $ subdivisions: int 2

It stop at just two subdivisions! Now, in order to see what is going on during the integration, we create a global object and update it every time the integration routine does something.

global.eval.f <- list()
f.trace <- function(x, numstab = 1) {
  this.f <- f(x, numstab)
  global.eval.f[[length(global.eval.f) + 1]] <<- list(x = x, f = this.f)
  return(this.f)
}
integrate(f.trace, -Inf, 0)

Now, we visualise this integration process.

library(animation)
l <- length(global.eval.f)
mycols <- rainbow(l, end = 0.72, v = 0.8)
saveGIF({
  for (i in 1:l) {
    par(mar = c(4, 4, 2, 0.3))
    plot(xgrid <- seq(-0.1, -0.01, length.out = 301), f(xgrid), type = "l", bty = "n", xlab = "x", ylab = "f(x)", main = "Function without stabilisation")
    for (j in 1:(l2 <- length(this.x <- global.eval.f[[i]]$x))) lines(rep(this.x[j], 2), c(0, global.eval.f[[i]]$f[j]), col = mycols[i], type = "b", pch = 16, cex = 0.6)
    legend("topleft", paste0("Quadrature: ", i), bty = "n")
    text(rep(-0.1, l2), seq(325, 25, length.out = l2), labels = formatC(sort(this.x), format = "e", digits = 2), adj = 0, col = ifelse(sort(this.x) > -0.1 & sort(this.x) < -0.01, mycols[i], "black"), cex = 0.9)
  }
}, movie.name = "stab-off-quad.gif", interval = 1 / 3, ani.width = 400, ani.height = 300)

Quadrature without stabilisation

And the same thing, but on a different scale.

global.eval.f <- list()
integrate(f.trace, -Inf, 0, numstab = sc)
l <- length(global.eval.f)
mycols <- rainbow(l, end = 0.7, v = 0.8)
saveGIF({
  for (i in 1:l) {
    par(mar = c(4, 4, 2, 0.3))
    plot(xgrid <- seq(-0.1 / sc, -0.01 / sc, length.out = 301), f(xgrid, sc), type = "l", bty = "n", xlab = "x", ylab = "f(x)", main = "Function with stabilisation")
    for (j in 1:(l2 <- length(this.x <- global.eval.f[[i]]$x))) lines(rep(this.x[j], 2), c(0, global.eval.f[[i]]$f[j]), col = mycols[i], type = "b", pch = 16, cex = 0.6)
    legend("topleft", paste0("Quadrature: ", i), bty = "n")
    text(rep(-0.1 / sc, l2), seq(325 * sc, 25 * sc, length.out = l2), labels = formatC(sort(this.x), format = "e", digits = 2), adj = 0, col = ifelse(sort(this.x) > -0.1 / sc & sort(this.x) < -0.01 / sc, mycols[i], "black"), cex = 0.9)
  }
}, movie.name = "stab-on-quad.gif", interval = 1 / 3, ani.width = 400, ani.height = 300)

Quadrature with stabilisation

The problem is, I cannot try various stabilising multipliers for the function because I have to compute this integral a trillion times, so even in the super-computer cluster, this takes weeks. Besides that, reducing the rel.tol just to 1e-5 helped a bit, but I am not sure whether this guarantees success (and reducing it to 1e-7 slowed down the computations in some cases). And I have looked at the Fortran code of the quadrature just to see the integration rule.

The timings can be seen below (I added an extra attempt with a lower tolerance).

Integration timings

How can I make sure that the integration routine will not produce such wrong results for such a function, and the integration will still be fast?

  • Can you determine an approximate bound on the lower limit of integration? The problem seems to be associated with `-Inf` as the lower limit of integration. – thc Apr 11 '19 at 19:46
  • @thc It is possible, yes, but not very practical. Since the `x` variable represents the distribution of one-sign market returns, we can safely assume that the unscaled function will never take values significantly different from zero below −0.3. However, this model is used to compute the Value-at-Risk (5%, 1%, 0.1%), so tail behaviour is extremely important. In other models, instead of gamma distribution, power law decay is used, and the bound might be wide. Finally, the official `integrate` manual advocates the use of actual `Inf`inite limits instead of just very large values (like -1e1)..? – Andreï V. Kostyrka Apr 11 '19 at 20:08
  • You could use a bounded integral to detect when there is an issue, if it differs from the unbounded integral too much. Two meta-suggestions: 1) you may get better responses on r-devel, as your example does seem sort of bug-lite. 2) You may eventually want to check out other libraries, as R would be slower with that much computation. E.g.: https://www.boost.org/doc/libs/master/libs/math/doc/html/math_toolkit/gauss_kronrod.html – thc Apr 11 '19 at 20:13
  • @thc Thank you. To be on the safe side, I shall try three options (unscaled unbounded, scaled unbounded, unscaled bounded), and then and pick the maximum (because it can only under-estimate). I shall also try to use the Boost library and maybe pre-compute more points (with G10—K21 or G15—K31 rule; I shall `microbenchmark` which one converges faster). – Andreï V. Kostyrka Apr 11 '19 at 20:22
  • Do you mind if I post this question to r-devel? Would you agree that it's a bug? – thc Apr 12 '19 at 16:38
  • @thc I wrote to r-devel 6 hours ago, but this thing seems to be on moderation. I should be grateful if you either approved it (if you can) or posted it. – Andreï V. Kostyrka Apr 12 '19 at 20:13
  • I have no power there, but if its your first post, it requires a moderator to manually approve. Someone will get to it eventually ^_^ – thc Apr 12 '19 at 20:21

0 Answers0