Gradient Ascent convergence

Question

I am trying to maximise the log of an objective function by the gradient ascent procedure. I am observing an objective value sequence in which the values first increase and then start decreasing again. I wanted to know if this is a possibility ? What i mean to say is that, does such functions exist whose ascent procedure passes through the maxima and then then produces decreasing values path. Following is the link to the objective value sequence.

Value Sequence

Alex A. · Answer 1 · 2013-09-28T03:48:10.277

To answer the general question: Certainly. If your function is not differentiable there is no guarantee that following the gradient will increase the function value. Consider for instance a function like -abs(x).

That said, unless you think your function might not be differentiable I suspect Memming is correct in that you have some error in your descent/ascent implementation, especially considering the way the iteration is diverging across multiple iterations.

score 1 · Answer 2 · answered Sep 28 '13 at 06:28

The short answer is no, it is not possible as long as following conditions hold:

The objective function is differentable (and if you are using any of the classical objectives like log-likelihood then it is true)
You are using small enough step size (although in most cases, if you choose too big step you should observe oscilations around some value, not the consistent decrese, but it is still possible)
Your objective function is iteration-indenepdent (so it is only a function of training set, and does not change during time, although it can still measure some model's complexity to add regularization)
Your implementation is correct - and this is the most probable solution, either you calculate gradient in a wrong way, or your gradient ascent algorithm has some bug

although it might be the case, that the required step size is not constant for very complex functions. To ensure that your GA/GD converges to stationary point you have to choose step size smaller then 2/L where your objective function is a L-Lipschitz function.

score 0 · Answer 3 · answered Sep 28 '13 at 00:04

If your objective function is deterministic, gradient ascent should always increase your objective function in each step if an appropriately small step size is chosen, and you are not at the maximum. From your output, it seems your gradient implementation is incorrect. Try using the numerical gradient. It's slower to compute, but there's a lower chance of making a mistake.

score 0 · Answer 4 · answered Sep 28 '13 at 06:24

0

If the following holds :

1) the objective is concave

2) the objective is differential

3) the step-size is small enough

4) the isn't any bug

then the gradient ascent should increase monotonically.

If 1+2+4 hold, maybe try backtracking line search to set your step size.

answered Sep 28 '13 at 06:24

ethanf

45
6

Concave =/> differential! Just look at -|x| as an example. It is true that you don't need concavity for monotonicity, I has jumping ahead to a global maximum (which is normally the goal) – ethanf Sep 28 '13 at 07:12

Gradient Ascent convergence

4 Answers4