3

I'm trying to use pgtol in R's optim function and not getting anywhere.

I optimize this function

RosenbrockFactory <- function(a,b) {
  function(v) {
    return( (a-v[1])^2 + b*(v[2]-v[1]^2)^2 )
  } 
}
# the exact function to optimize
fn <- RosenbrockFactory(2,3)

I use this helper function to get the gradient at a=2, b=3

gradR <- function(v) {
  x <- v[1]
  y <- v[2]
  gr <- rep(0,2)
  gr[1] <- -2 * (2-x) - 12 * (y-x^2)*x
  gr[2] <- 3*2*(y-x^2)
  return(gr)
}

default optimization, and check, gradient higher than I want

fn <- RosenbrockFactory(2,3)
opt0 <- optim(c(0,0), fn, method="L-BFGS-B")
gradR(opt0$par)
# [1]  9.060194e-05 -3.449572e-05

The R stats package help for optim documentation says, "helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed."

This only says what happens with pgtol is 0, doesn't say what non-zero values do, so try a large value of 1e10:

opt10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e10))
gradR(opt10$par)
# [1] -4  0
opt10$par
# [1] 0 0

that stops any optimization. Okay, try going the other way:

optm10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-10))
gradR(optm10$par)
# [1]  9.060194e-05 -3.449572e-05

The small value (1e-10) changed nothing relative to before I changed pgtol. So, I try cranking that up

optm300 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-300))
gradR(optm300$par)
# [1]  9.060194e-05 -3.449572e-05

so, it did nothing again for about the smallest double.

Just verify, I can optimize this with Newton's method

Newton <- function(x0, tol) {
  xi <- x0
  grNorm <- tol*2 # let itteration start
  while( grNorm > tol) {
    f0 <- fn(xi)
    gr <- gradR(xi)
    H <- hessR(xi)
    step <- -1 * qr.solve(qr(H), gr)
    xi <- xi + step
    grNorm <- sqrt(sum(gr*gr))
  }
  return(xi)
}

hessR <- function(v) {
  x <- v[1]
  y <- v[2]
  H <- matrix(0, nrow=2, ncol=2)
  H[1,1] <- 2 - 12 * y + 36*x^2
  H[1,2] <- H[2,1] <- -12*x
  H[2,2] <-  6
  return(H)
}


n1 <- Newton(c(-1,2), 1e-1)
gradR(n1)
# [1]  0.0010399242 -0.0002601235
n2 <- Newton(c(-1,2), 1e-2)
gradR(n2)
# [1] -1.461804e-10 -4.822809e-13
n3 <- Newton(c(-1,2), 1e-3)
gradR(n3)
# [1] 0 0
print(n3 - c(2,4), digits=16) # exactly at the correct answer

so here the lower I set the tolerance the closer I get. Of course, once Newton's method is close it gets the exact answer.

This is a MWE to show that I cannot get L-BFGS-G to continue until the gradient is as small as I ask for a function that is somewhat difficult to optimize but Newton's method can, in fact, solve it exactly.

Any idea how pgtol works or what I can do to convince optim to keep going until the optimization has found a result below a prespecified gradient size? I'd accept this in norm, I just want to be able to encourage it to keep going.

Progman
  • 16,827
  • 6
  • 33
  • 48
pdb
  • 1,574
  • 12
  • 26

1 Answers1

0

Please note that with method "L-BFGS-B" the option to control the convergence is 'factr'!

opt0 <- optim(c(0,0), fn, method="L-BFGS-B",
              control = list(factr = 1e-10))

opt0$par            #>  1.999988  3.999952
gradR(opt0$par)     #> -2.399986e-05  0.000000e+00

Methods like "L-BFGS-B" minimize the function values, they do not attempt to make the gradient as small as possible at the optimal point.

There are more modern optimization functions for R, see the "Optimization and Mathematical Programming" Task View.

library(lbfgsb3c)

lbfgsb3(c(0,0), fn, control = list(factr = 1e-10))
$par
[1] 2 4
$grad
[1]  5.846166e-14 -3.804366e-14

This uses the Fortran implementation of "L-BFGS-B" by Nocedal et al.

Hans W.
  • 1,799
  • 9
  • 16
  • Thanks for pointing that out. Yes, factr controls the L-BFGS-B method, but the current pgtol manual entry text is, "helps control the convergence of the 'L-BFGS-B' method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed." And that's what I want to use as a convergence criterion: a function of the gradient. I'll look at lbfgsb3c, thanks for that tip. – pdb Mar 07 '22 at 14:29
  • I think you are misled by a - maybe unclear - help entry. The solver stops when the minimum is reached with sufficient accuracy, even if the projected gradient is larger than pgtol. You will only change this behavior by interfering with the code. – Hans W. Mar 07 '22 at 21:57
  • in what sense does `pgtol` "help control the convergence"? – pdb Mar 08 '22 at 14:55
  • 1
    You saw this with your `1e10` test: It stops the iteration if the projected gradient is smaller than `pgtol`; but it does *not* force the iteration to continue *until* the gradient is smaller. Very small gradients can be a computation/precision problem for BFGS methods, and `pgtol` helps to avoid it. -- I admit this is an "educational guess". If you want to know better, post on Stackoverflow with tag "[mathematical-optimization]" (no "[r]"), as your question is largely independent of any implementation. – Hans W. Mar 08 '22 at 19:19