Convergence of BFGS for convex over-parameterized problems

Question

It is "well-known" that the BFGS optimization algorithm is superlinearly convergent for strictly convex problems, but is there any analysis for problems that are non-strictly convex. For example, suppose that f(x) is convex for some scalar x. Then, suppose we optimize over g(x1,x2)=f(x1+x2). Will this still always be superlinearly convergent?

You might want to try a few more tags like "algorithm" or "numerical-analysis"; "optimization" around here is usually in the sense of "how do I optimize this bit of code". I'm not sure if MathOverflow.net would be a better place to ask this; it might not be a hard enough (i.e. research level) question for them. — celion, Feb 16 '10 at 23:07

score 1 · Answer 1 · answered Feb 22 '10 at 22:43

Whether BFGS converges at all on non-convex problems is still an open problem. In fact, in 1984 Powell gave a counter-example that shows that BFGS with an inexact line-search search may fail to converge. What can be made are local statements, such as: Given a local minima x*, if you eventually enter a region of space near x* BFGS will converge super-linearly. The reason for this is that near x*, the objective function can be accurately modelled by a convex quadratic.

As for what is known for the composition function you gave, I am not sure. For a detailed explanation of the properties of BFGS, see either Dennis and Schnabel or Nocedal and Wright.

Best of luck.

Jive Dadson · Answer 2 · 2012-07-16T02:50:14.457

In practice I have found that a carefully written algorithm will converge, but not necessarily super-linearly. Roundoff error is the culprit. Convergence criteria come into play. It is the same for functions that are "almost" not convex, i.e. "stiff."

One must be careful with the BFGS updates to assure that the resulting approximate Hessian remains positive-definite "enough" even though the theoretical Hessian is not. What I do is to keep and update a Cholesky decomposition of the Hessian, rather than the Hessian per se or its inverse.

score 0 · Answer 3 · answered Feb 16 '10 at 23:17

0

Correct me if I'm wrong, but won't the "solution" in this case actually be a line, not a single point? If x' is a minimizer for f(x), then the best you can hope for when applying any method to g(x1, x2) is for it to converge to the line x2 = x' - x1.

answered Feb 16 '10 at 23:17

celion

3,864
25
19

Convergence of BFGS for convex over-parameterized problems

3 Answers3