What numerical optimizers can operate with only gradients, and no explicit value of the objective?

Question

I have an optimization problem that involves minimizing a function whose gradient I know, but the actual value of objective function at any point is unknown.

I'd like to optimize the function using BFGS, but all of the BFGS implementations I've found seem to require knowledge of the value of the objective, especially in the line search step. I've looked at both a python (scipy) and C++ implementation of BFGS.

Obviously I can use gradient descent, but I'd prefer not to reinvent the wheel here.

Any ideas?

Some more detail: I want to minimize h. But I'm not given h. What I'm given is h = f(g), and an explicit formula for g(x). f basically transforms the gradients of g in a kind of tricky geometric way that is not too difficult to calculate, but impossible to integrate. So, it's pretty straightforward to calculate the gradient of h(x), but hard to get explicit values for h(x).

Some more detail: I want to minimize $h$. But I'm not given $h$. What I'm given is $\del h = f(\del g)$, and an explicit formula for $g(x)$. $f$ basically transforms the gradients of $g$ in a kind of tricky geometric way that is not too difficult to calculate, but impossible to integrate. So, it's pretty straightforward to calculate the gradient of $h(x)$, but hard to get explicit values for $h(x)$. — Robert T. McGibbon, Feb 01 '13 at 02:22
Can you provide the definitions for f and g? If not can you provide more information about h, eg is it convex? is it n times continuously differentiable? — orizon, Feb 01 '13 at 03:17

score 3 · Answer 1 · answered Feb 01 '13 at 16:51

3

I believe you've reduced the problem to a one of finding roots. You could use one of the root finders in scipy, then you simply have to check to see if that point is a minimum, maximum or inflection point.

answered Feb 01 '13 at 16:51

Bi Rico

25,283
3
52
75

I thought this was the answer, but on second thought, I actually think it's not quite true. This is an extremely specialized root finding problem, because in general we know that we want to go in the direction opposite the gradient. If we're just doing root finding, and all we know is that we want to find where the gradient is zero, we can really only get the search direction from the inverse jacobian/hessian. – Robert T. McGibbon Feb 08 '13 at 00:00

Aditya Sihag · Answer 2 · 2013-02-01T05:55:50.520

2

In that case, try minimizing h(x) to the power two. This is because you are essentially searching for points where h(x) is close to zero. You can convexify it by squaring it and running your parameter search.

EDIT: sorry, what i meant was h(x) is the gradient ..

edited Feb 01 '13 at 05:55

answered Feb 01 '13 at 04:22

Aditya Sihag

5,057
4
32
43

So your solution to their problem of not having h is to square h? – Dason Feb 01 '13 at 04:29
Moreover the square of a function that is not convex is not usually convex either. For example f(x) = sin(x). – orizon Feb 01 '13 at 04:32
maybe he meant to minimize the square of the gradient? – flodel Feb 01 '13 at 04:43
I think that this answer is quite good: instead of minimizing the functional h, just try to find x such that some norm of the gradient is zero. – Dr_Sam Feb 01 '13 at 14:06
1

this might work, but may not actually work very well -- see e.g. the relevant chapter of *Numerical Recipes* (chapter 10 I think), which points out that root-finding is much harder and worse conditioned than minimization. But if you don't have a choice ... – Ben Bolker Feb 01 '13 at 21:59

score 1 · Accepted Answer · answered Feb 08 '13 at 00:06

After spending some time thinking about this, I think the answer is to just adapt a quasi-newton method like BFGS. The only place that the function value enters into the BFGS computation is in the line search section, the first Wolfe condition.

I think the solution is to instead use a line search method that doesn't check the first Wolfe condition (Armijo rule).

I implemented it for BFGS in python and C++: https://gist.github.com/rmcgibbo/4735287. On second though though, I think you could get the same result by supplying the BFGS routine with a function that is always decreasing (e.g. it contains a counter tracking the # of times it has been called, and always returns a smaller number than it did the last time you called it). The decrease has to be big enough that you always pass the Armijo rule (http://en.wikipedia.org/wiki/Wolfe_conditions).

Carl Witthoft · Answer 4 · 2013-02-01T15:27:02.330

Maybe talking about a simpler example will help. Take some scalar y=f(x). The gradient of y is df/dx . If you know the derivative everywhere, you can easily (!!) determine the value of f(x) either analytically or via numerical integration, but with an undeterminable global constant. The old "integral(f(x)dx) = F(x) + C " trick. So, unless you can anchor your h function at at least one point, you can't solve the problem. You can track down the location of the minimum x such that h(x) is min), but not the value of h(x)

What numerical optimizers can operate with only gradients, and no explicit value of the objective?

4 Answers4