28

I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm.

sigma_sp_new, func_val, info_dict = fmin_l_bfgs_b(func_to_minimize, self.sigma_vector[si][pj], 
                       args=(self.w_vectors[si][pj], Y, X, E_step_results[si][pj]),
                       approx_grad=True, bounds=[(1e-8, 0.5)], factr=1e02, pgtol=1e-05, epsilon=1e-08)

But sometimes I got a warning 'ABNORMAL_TERMINATION_IN_LNSRCH' in the information dictionary:

func_to_minimize value = 1.14462324063e-07
information dictionary: {'task': b'ABNORMAL_TERMINATION_IN_LNSRCH', 'funcalls': 147, 'grad': array([  1.77635684e-05,   2.87769808e-05,   3.51718654e-05,
         6.75015599e-06,  -4.97379915e-06,  -1.06581410e-06]), 'nit': 0, 'warnflag': 2}

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            6     M =           10
 This problem is unconstrained.

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.14462D-07    |proj g|=  3.51719D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    6      1     21      1     0     0   3.517D-05   1.145D-07
  F =  1.144619474757747E-007

ABNORMAL_TERMINATION_IN_LNSRCH                              

 Line search cannot locate an adequate point after 20 function
  and gradient evaluations.  Previous x, f and g restored.
 Possible causes: 1 error in function or gradient evaluation;
                  2 rounding error dominate computation.

 Cauchy                time 0.000E+00 seconds.
 Subspace minimization time 0.000E+00 seconds.
 Line search           time 0.000E+00 seconds.

 Total User time 0.000E+00 seconds.

I do not get this warning every time, but sometimes. (Most get 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL' or 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH').

I know that it means the minimum can be be reached in this iteration. I googled this problem. Someone said it occurs often because the objective and gradient functions do not match. But here I do not provide gradient function because I am using 'approx_grad'.

What are the possible reasons that I should investigate? What does it mean by "rounding error dominate computation"?

======

I also find that the log-likelihood does not monotonically increase:

########## Convergence !!! ##########
log_likelihood_history: [-28659.725891322563, 220.49993177669558, 291.3513633060345, 267.47745327823907, 265.31567762171181, 265.07311121000367, 265.04217683341682]

It usually start decrease at the second or the third iteration, even through 'ABNORMAL_TERMINATION_IN_LNSRCH' does not occurs. I do not know whether it this problem is related to the previous one.

Munichong
  • 3,861
  • 14
  • 48
  • 69
  • I'm facing similar problems. They all seem to be centered on the gradient function that I give the the optimizer. Do you know with 100% certainty that your gradient is completely correct? – jschabs Apr 14 '16 at 15:13
  • I am getting similar issues with L-BFGS when trying to maximize the log-likelihood of a function. I have to add that I am not passing the gradient of the function, but instead, I let L-BFGS approximate it. Sometimes I've able to overcome the issue by using the Nelder–Mead optimizer instead... Have you been able to solve this problem? – muammar Jun 17 '18 at 18:02
  • @muammar, in my experience of using L-BFGS, it only works well if you provide an explicit derivative function. Otherwise, it gets lost quite easily. – ap21 Sep 12 '18 at 02:47

4 Answers4

81

Scipy calls the original L-BFGS-B implementation. Which is some fortran77 (old but beautiful and superfast code) and our problem is that the descent direction is actually going up. The problem starts on line 2533 (link to the code at the bottom)

gd = ddot(n,g,1,d,1)
  if (ifun .eq. 0) then
     gdold=gd
     if (gd .ge. zero) then
c                               the directional derivative >=0.
c                               Line search is impossible.
        if (iprint .ge. 0) then
            write(0,*)' ascent direction in projection gd = ', gd
        endif
        info = -4
        return
     endif
  endif

In other words, you are telling it to go down the hill by going up the hill. The code tries something called line search a total of 20 times in the descent direction that you provide and realizes that you are NOT telling it to go downhill, but uphill. All 20 times.

The guy who wrote it (Jorge Nocedal, who by the way is a very smart guy) put 20 because pretty much that's enough. Machine epsilon is 10E-16, I think 20 is actually a little too much. So, my money for most people having this problem is that your gradient does not match your function.

Now, it could also be that "2. rounding errors dominate computation". By this, he means that your function is a very flat surface in which increases are of the order of machine epsilon (in which case you could perhaps rescale the function), Now, I was thiking that maybe there should be a third option, when your function is too weird. Oscillations? I could see something like $\sin({\frac{1}{x}})$ causing this kind of problem. But I'm not a smart guy, so don't assume that there's a third case.

So I think the OP's solution should be that your function is too flat. Or look at the fortran code.

https://github.com/scipy/scipy/blob/master/scipy/optimize/lbfgsb/lbfgsb.f

Here's line search for those who want to see it. https://en.wikipedia.org/wiki/Line_search

Note. This is 7 months too late. I put it here for future's sake.

Wilmer E. Henao
  • 4,094
  • 2
  • 31
  • 39
  • 4
    This error message also is printed if the target function (or possibly gradient) becomes `nan`. – Ilya Kolpakov Nov 01 '18 at 19:54
  • 1
    "The code tries something called line search a total of 20 times in the descent direction that you provide and realizes that you are NOT telling it to go downhill, but uphill." –– Would that not mean that it has found a local optimum and should finish instead of throwing an error? – El Dude Nov 02 '18 at 18:13
  • Nocedal's code catches convergence in another place in the code. – Wilmer E. Henao Nov 12 '18 at 20:25
7

As pointed out in the answer by Wilmer E. Henao, the problem is probably in the gradient. Since you are using approx_grad=True, the gradient is calculated numerically. In this case, reducing the value of epsilon, which is the step size used for numerically calculating the gradient, can help.

toliveira
  • 1,533
  • 16
  • 27
6

I also got the error "ABNORMAL_TERMINATION_IN_LNSRCH" using the L-BFGS-B optimizer.

While my gradient function pointed in the right direction, I rescaled the actual gradient of the function by its L2-norm. Removing that or adding another appropriate type of rescaling worked. Before, I guess that the gradient was so large that it went out of bounds immediately.

The problem from OP was unbounded if I read correctly, so this will certainly not help in this problem setting. However, googling the error "ABNORMAL_TERMINATION_IN_LNSRCH" yields this page as one of the first results, so it might help others...

gebbissimo
  • 2,137
  • 2
  • 25
  • 35
2

I had a similar problem recently. I sometimes encounter the ABNORMAL_TERMINATION_IN_LNSRCH message after using fmin_l_bfgs_b function of scipy. I try to give additional explanations of the reason why I get this. I am looking for complementary details or corrections if I am wrong.

In my case, I provide the gradient function, so approx_grad=False. My cost function and the gradient are consistent. I double-checked it and the optimization actually works most of the time. When I get ABNORMAL_TERMINATION_IN_LNSRCH, the solution is not optimal, not even close (even this is a subjective point of view). I can overcome this issue by modifying the maxls argument. Increasing maxls helps to solve this issue to finally get the optimal solution. However, I noted that sometimes a smaller maxls, than the one that produces ABNORMAL_TERMINATION_IN_LNSRCH, results in a converging solution. A dataframe summarizes the results. I was surprised to observe this. I expected that reducing maxls would not improve the result. For this reason, I tried to read the paper describing the line search algorithm but I had trouble to understand it.

The line "search algorithm generates a sequence of nested intervals {Ik} and a sequence of iterates αk ∈ Ik ∩ [αmin ; αmax] according to the [...] procedure". If I understand well, I would say that the maxls argument specifies the length of this sequence. At the end of the maxls iterations (or less if the algorithm terminates in fewer iterations), the line search stops. A final trial point is generated within the final interval Imaxls. I would say the the formula does not guarantee to get an αmaxls that respects the two update conditions, the minimum decrease and the curvature, especially when the interval is still wide. My guess is that in my case, after 11 iterations the generated interval I11 is such that a trial point α11 respects both conditions. But, even though I12 is smaller and still containing acceptable points, α12 is not. Finally after 24 iterations, the interval is very small and the generated αk respects the update conditions.

Is my understanding / explanation accurate? If so, I would then be surprised that when maxls=12, since the generated α11 is acceptable but not α12, why α11 is not chosen in this case instead of α12?

Pragmatically, I would recommend to try a few higher maxls when getting ABNORMAL_TERMINATION_IN_LNSRCH.

kapytaine
  • 31
  • 2