scipy optimize minimize: hess_inv strongly depends on initial guess

Question

I am using scipy.optimize.minimize to minimize a simple log likelihood function. The Hessian matrix doesn't seem to behave well.

import scipy.optimize as op

def lnlike(theta, n, bhat, fhat, sigb, sigf):
    S, b, f = theta
    mu = f*S + b
    scb2 = ((b-bhat)/sigb)**2
    scf2 = ((f-fhat)/sigf)**2
    return n*np.log(mu) - mu - 0.5*(scb2+scf2)
nll = lambda *args: -lnlike(*args)

myargs=(21.0, 20.0, 0.5, 6.0, 0.1)

If the initial guess is at the minimum, the iteration doesn't go anywhere. That is fine in terms of the parameter values, but it doesn't touch Hessian (still identity) either, so I cannot use it for uncertainty estimation.

x0 = [2.0, 20.0, 0.5]  # initial guess is at the minimum
result = op.minimize(nll, x0, args= myargs)
print result

   status: 0
  success: True
     njev: 1
     nfev: 5
 hess_inv: array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])
      fun: -42.934971192191881
        x: array([  2. ,  20. ,   0.5])
  message: 'Optimization terminated successfully.'
      jac: array([  0.00000000e+00,   0.00000000e+00,   9.53674316e-07])

If I change the initial guess a little bit, it seems to return a sensible hess_inv.

x0 = [2.01, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])

status: 0
  success: True
     njev: 15
     nfev: 75
 hess_inv: array([[  2.16004477e+02,  -7.60588367e+01,  -2.94846112e-02],
       [ -7.60588367e+01,   3.55748024e+01,   2.74064505e-03],
       [ -2.94846112e-02,   2.74064505e-03,   9.98030944e-03]])
      fun: -42.934971191969964
        x: array([  1.99984604,  19.9999814 ,   0.5000001 ])
  message: 'Optimization terminated successfully.'
      jac: array([ -2.38418579e-06,  -5.24520874e-06,   1.90734863e-06])
14.697090757

However, hess_inv is very sensitive to the initial guess.

x0 = [2.02, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])

   status: 0
  success: True
     njev: 16
     nfev: 80
 hess_inv: array([[  1.82153214e+02,  -6.03482772e+01,  -2.97458789e-02],
       [ -6.03482772e+01,   3.30771459e+01,  -2.53811809e-03],
       [ -2.97458789e-02,  -2.53811809e-03,   9.99052952e-03]])
      fun: -42.934971192188634
        x: array([  1.9999702 ,  20.00000354,   0.50000001])
  message: 'Optimization terminated successfully.'
      jac: array([ -9.53674316e-07,  -4.76837158e-07,  -4.76837158e-07])
13.4964148462

Change the initial guess a bit more

x0 = [2.03, 20.0, 0.5]
result = op.minimize(nll, x0, args= myargs)
print result
print np.sqrt(result.hess_inv[0,0])

   status: 0
  success: True
     njev: 14
     nfev: 70
 hess_inv: array([[  2.30479371e+02,  -7.36087027e+01,  -3.79639119e-02],
       [ -7.36087027e+01,   3.55785937e+01,   3.54182478e-03],
       [ -3.79639119e-02,   3.54182478e-03,   9.97664441e-03]])
      fun: -42.93497119204827
        x: array([  1.99975148,  20.00006366,   0.50000009])
  message: 'Optimization terminated successfully.'
      jac: array([ -9.53674316e-07,  -9.53674316e-07,   4.29153442e-06])
15.1815470484

Did I miss something? Is this a bug or a feature?

score 2 · Accepted Answer · answered Apr 04 '15 at 10:39

The way I understand the optimizers, the Hessian are approximated by finite differences. In your case, it does not seem the best idea. Perhaps, utilizing Sympy (in IPython) will produce more usable results:

import sympy as sy
import numpy as np
import scipy.optimize as sopt

from IPython.display import display # nice printing

sy.init_printing()  # LaTeX like printing for IPython

def lnlike(theta, n, bhat, fhat, sigb, sigf):
    S, b, f = theta
    mu = f*S + b
    scb2 = ((b-bhat)/sigb)**2
    scf2 = ((f-fhat)/sigf)**2
    return n*sy.log(mu) - mu - (scb2+scf2) / 2

# declare symbols:
th_S, th_b, th_f = sy.symbols("theta_S, theta_b, theta_f", real=True)
theta = (th_S, th_b, th_f)
n, bhat, fhat = sy.symbols("n, \hat{b}, \hat{f}", real=True )
sigb, sigf = sy.symbols("sigma_b, sigma_d", real=True )

# symbolic optimizaton function:
lf = -lnlike(theta, n, bhat, fhat, sigb, sigf)
# Gradient:
dlf = sy.Matrix([lf.diff(th) for th in theta])
# Hessian:
Hlf = sy.Matrix([dlf.T.diff(th) for th in theta])

print("Symbolic Hessian:")
display(Hlf)

# Make numpy functions:
margs = {n:21, bhat:20, fhat:.5, sigb:6, sigf:.1} # parameters
lf_a, dlf_a, Hlf_a = lf.subs(margs), dlf.subs(margs), Hlf.subs(margs)
lf_lam =  sy.lambdify(theta,  lf_a, modules="numpy")
dlf_lam = sy.lambdify(theta, dlf_a, modules="numpy")
Hlf_lam = sy.lambdify(theta, Hlf_a, modules="numpy")

nlf = lambda xx: np.array(lf_lam(xx[0], xx[1], xx[2]))  # function
ndlf = lambda xx: np.array(dlf_lam(xx[0], xx[1], xx[2])).flatten()  # gradient
nHlf = lambda xx: np.array(Hlf_lam(xx[0], xx[1], xx[2]))  # Hessian

x0 = [2.02, 20.0, 0.5]
rs = sopt.minimize(nlf, x0, jac=ndlf, hess=nHlf, method='Newton-CG')
print(rs)
print("Hessian:")
print(nHlf(rs.x))

Thanks. This is nice. However, it looks like this only works for simple cases where analytical differentials are available. Is there a method that would produce a more accurate hessian numerically? — physcheng, Apr 04 '15 at 23:25
As long as you can express your likelihood as a formula, I'm quite sure that Sympy can calculate a Hessian, given that it exists. If you want robust numeric approaches, you need to have some knowledge about the smoothness of your function to choose an appropriate differentiator. Another standard technique is called "Automatic differentiation" (https://en.wikipedia.org/wiki/Automatic_differentiation). For interpreted languages like Python I don't see advantages of Automatic differentiation techniques over Sympy (when utilizing sy.symplify). — Dietrich, Apr 05 '15 at 00:24

score 1 · Answer 2 · answered Apr 04 '15 at 14:17

1

If you're using a quasi-Newton method, which from the documentation it appears you are:

Quasi-Newton methods build up a guess at the Hessian inverse by applying a sequence of low-rank updates to a completely naive guess (typically a multiple of the identity). The low-rank updates used are in some sense the "least-change" updates that make a given equation hold, and the meaning of "least-change" varies with the quasi-Newton method chosen. If you start at, or very close to, the minimiser, the optimiser will figure this out very quickly and it won't build up much information in its approximation to the Hessian inverse.

answered Apr 04 '15 at 14:17

tmyklebu

13,915
3
28
57

Thank you for the explanation. Is there a method that would produce a more accurate Hessian matrix? – physcheng Apr 04 '15 at 23:27
@physcheng: You can usually just compute the thing symbolically. Keep in mind that storing the Hessian or Hessian inverse will take space quadratic in the number of variables; if you are training models with tons of parameters, this quickly becomes useless. If not, it's generally not too hard to compute the Hessian by hand. – tmyklebu Apr 05 '15 at 00:10

scipy optimize minimize: hess_inv strongly depends on initial guess

2 Answers2