I'm trying to use pgtol
in R's optim function and not getting anywhere.
I optimize this function
RosenbrockFactory <- function(a,b) {
function(v) {
return( (a-v[1])^2 + b*(v[2]-v[1]^2)^2 )
}
}
# the exact function to optimize
fn <- RosenbrockFactory(2,3)
I use this helper function to get the gradient at a=2, b=3
gradR <- function(v) {
x <- v[1]
y <- v[2]
gr <- rep(0,2)
gr[1] <- -2 * (2-x) - 12 * (y-x^2)*x
gr[2] <- 3*2*(y-x^2)
return(gr)
}
default optimization, and check, gradient higher than I want
fn <- RosenbrockFactory(2,3)
opt0 <- optim(c(0,0), fn, method="L-BFGS-B")
gradR(opt0$par)
# [1] 9.060194e-05 -3.449572e-05
The R stats
package help for optim
documentation says, "helps control the convergence of the "L-BFGS-B" method. It is a tolerance on the projected gradient in the current search direction. This defaults to zero, when the check is suppressed."
This only says what happens with pgtol is 0, doesn't say what non-zero values do, so try a large value of 1e10:
opt10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e10))
gradR(opt10$par)
# [1] -4 0
opt10$par
# [1] 0 0
that stops any optimization. Okay, try going the other way:
optm10 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-10))
gradR(optm10$par)
# [1] 9.060194e-05 -3.449572e-05
The small value (1e-10
) changed nothing relative to before I changed pgtol
. So, I try cranking that up
optm300 <- optim(c(0,0), fn, method="L-BFGS-B", control=list(pgtol=1e-300))
gradR(optm300$par)
# [1] 9.060194e-05 -3.449572e-05
so, it did nothing again for about the smallest double.
Just verify, I can optimize this with Newton's method
Newton <- function(x0, tol) {
xi <- x0
grNorm <- tol*2 # let itteration start
while( grNorm > tol) {
f0 <- fn(xi)
gr <- gradR(xi)
H <- hessR(xi)
step <- -1 * qr.solve(qr(H), gr)
xi <- xi + step
grNorm <- sqrt(sum(gr*gr))
}
return(xi)
}
hessR <- function(v) {
x <- v[1]
y <- v[2]
H <- matrix(0, nrow=2, ncol=2)
H[1,1] <- 2 - 12 * y + 36*x^2
H[1,2] <- H[2,1] <- -12*x
H[2,2] <- 6
return(H)
}
n1 <- Newton(c(-1,2), 1e-1)
gradR(n1)
# [1] 0.0010399242 -0.0002601235
n2 <- Newton(c(-1,2), 1e-2)
gradR(n2)
# [1] -1.461804e-10 -4.822809e-13
n3 <- Newton(c(-1,2), 1e-3)
gradR(n3)
# [1] 0 0
print(n3 - c(2,4), digits=16) # exactly at the correct answer
so here the lower I set the tolerance the closer I get. Of course, once Newton's method is close it gets the exact answer.
This is a MWE to show that I cannot get L-BFGS-G to continue until the gradient is as small as I ask for a function that is somewhat difficult to optimize but Newton's method can, in fact, solve it exactly.
Any idea how pgtol
works or what I can do to convince optim
to keep going until the optimization has found a result below a prespecified gradient size? I'd accept this in norm, I just want to be able to encourage it to keep going.