It is "well-known" that the BFGS optimization algorithm is superlinearly convergent for strictly convex problems, but is there any analysis for problems that are non-strictly convex. For example, suppose that f(x) is convex for some scalar x. Then, suppose we optimize over g(x1,x2)=f(x1+x2). Will this still always be superlinearly convergent?
-
You might want to try a few more tags like "algorithm" or "numerical-analysis"; "optimization" around here is usually in the sense of "how do I optimize this bit of code". I'm not sure if MathOverflow.net would be a better place to ask this; it might not be a hard enough (i.e. research level) question for them. – celion Feb 16 '10 at 23:07
3 Answers
Whether BFGS converges at all on non-convex problems is still an open problem. In fact, in 1984 Powell gave a counter-example that shows that BFGS with an inexact line-search search may fail to converge. What can be made are local statements, such as: Given a local minima x*, if you eventually enter a region of space near x* BFGS will converge super-linearly. The reason for this is that near x*, the objective function can be accurately modelled by a convex quadratic.
As for what is known for the composition function you gave, I am not sure. For a detailed explanation of the properties of BFGS, see either Dennis and Schnabel or Nocedal and Wright.
Best of luck.

- 731
- 5
- 11
In practice I have found that a carefully written algorithm will converge, but not necessarily super-linearly. Roundoff error is the culprit. Convergence criteria come into play. It is the same for functions that are "almost" not convex, i.e. "stiff."
One must be careful with the BFGS updates to assure that the resulting approximate Hessian remains positive-definite "enough" even though the theoretical Hessian is not. What I do is to keep and update a Cholesky decomposition of the Hessian, rather than the Hessian per se or its inverse.

- 16,680
- 9
- 52
- 65
Correct me if I'm wrong, but won't the "solution" in this case actually be a line, not a single point? If x' is a minimizer for f(x), then the best you can hope for when applying any method to g(x1, x2) is for it to converge to the line x2 = x' - x1.

- 3,864
- 25
- 19