3

I am currently using the minimize(BFGS) function from SciPy.optimize to calibrate my model. Once calibrated the parameters will be just slightly perturbed from time to time and I would like to reuse the Jacobian and IHessian from the previous calibration for my initial ste( a kind of initial guess for jac and hessian and then switch to regular calculation for these)

Any idea how to do it?

Dave
  • 1,835
  • 4
  • 26
  • 44

2 Answers2

1

You will have to modify scipy slightly to do this. The Hessian approximation for BFGS is stored in the variable Hk, first defined in line 1308 of the function _minimize_bfgs() in the file _optimize.py here: https://github.com/scipy/scipy/blob/65d8361519e3f1166d81dceef9f1d6375b4ed808/scipy/optimize/_optimize.py#L1308

You will have to modify scipy's code return or store this matrix, and pass it to the function again as an argument next time.


Alternatively, you can use scipy's nonlinear root finding instead, because minimizing a function is the same as finding the point, x, where the gradient, g(x), of the function is zero. The caveat is that this is much less robust from an algorithm perspective. The root finder may converge to saddle points or maxima, etc.

The BFGS algorithm for optimization is very similar to Broyden's method for finding the roots of a nonlinear function. Broyden is the B in BFGS, and BFGS is basically a modified version Broydens root finding method applied to the equation g(x)=0.

From a code perspective, scipy provides better access to the internals of the Broyden's nonlinear root finder than it does to BFGS. The Broyden Jacobian matrix (which is the analog of the BFGS Hessian) is stored as an object of type "Jacobian" which can be re-used if you over-write the "setup()" method to do nothing, so it won't reset. For example:

import numpy as np
from scipy.optimize import nonlin, rosen, rosen_der


x0 = np.array([0.0, 0.0])

counter = [1]
def callback(x, Fx):
    print('k=', counter[0], '   xk=', x, '   f(xk)=', Fx)
    counter[0] += 1

print('first solve:')
jac = nonlin.BroydenFirst()
sol1 = nonlin.nonlin_solve(rosen_der, x0, jacobian=jac, callback=callback, line_search='wolfe')

print('')
print('Jacobian after first solve:')
print(jac.todense())


print('')
print('second solve:')
jac.setup = lambda x,y,z: None # Prevent Jacobian from resetting. Try commenting this line out!
counter = [1]
sol2 = nonlin.nonlin_solve(rosen_der, x0, jacobian=jac, callback=callback, line_search='wolfe')

print('')
print('Jacobian after second solve:')
print(jac.todense())

This yields the following output:

first solve:
k= 1    xk= [-0.5  0. ]    f(xk)= [-53. -50.]
k= 2    xk= [-0.32283933  0.16713271]    f(xk)= [ 5.47792447 12.58149553]
k= 3    xk= [-0.28061925  0.08236319]    f(xk)= [-2.1553475  0.7232059]
k= 4    xk= [-0.21641127  0.02839649]    f(xk)= [-4.02884252 -3.68746961]
k= 5    xk= [-0.18109429  0.02229409]    f(xk)= [-3.12286088 -2.10021066]
k= 6    xk= [-0.19320298  0.02730297]    f(xk)= [-3.16110526 -2.00488442]
k= 7    xk= [-0.1500595   0.01281555]    f(xk)= [-2.88248831 -1.94046136]
k= 8    xk= [-0.14063472  0.00987684]    f(xk)= [-2.83825517 -1.98025685]
k= 9    xk= [-0.10559036  0.00122285]    f(xk)= [-2.63043649 -1.98529385]
k= 10    xk= [-0.07978769 -0.00399875]    f(xk)= [-2.49036939 -2.07296403]
k= 11    xk= [-0.04033837 -0.00840453]    f(xk)= [-2.24254189 -2.00634254]
k= 12    xk= [-0.01281811 -0.01000148]    f(xk)= [-2.07775868 -2.03315754]
k= 13    xk= [ 0.04612425 -0.00598848]    f(xk)= [-1.75801514 -1.62318494]
k= 14    xk= [ 0.03215952 -0.00653348]    f(xk)= [-1.83833134 -1.51354257]
k= 15    xk= [0.08036797 0.0021266 ]    f(xk)= [-1.6999891  -0.86648289]
k= 16    xk= [-0.04367326 -0.01098396]    f(xk)= [-2.31254872 -2.57826183]
k= 17    xk= [-0.07820406 -0.00365169]    f(xk)= [-2.46195331 -1.95351227]
k= 18    xk= [-0.02940764 -0.00337684]    f(xk)= [-2.10871    -0.84832937]
k= 19    xk= [-0.0314977  -0.00316138]    f(xk)= [-2.11532548 -0.83069659]
k= 20    xk= [-0.00053675 -0.00472907]    f(xk)= [-2.00208887 -0.94587179]
k= 21    xk= [ 0.01086898 -0.00493684]    f(xk)= [-1.95628507 -1.01099561]
k= 22    xk= [ 0.03866879 -0.00408101]    f(xk)= [-1.83641115 -1.11525685]
k= 23    xk= [ 0.06232922 -0.00227391]    f(xk)= [-1.72179137 -1.23176741]
k= 24    xk= [0.09084425 0.0019575 ]    f(xk)= [-1.5895594 -1.2590346]
k= 25    xk= [0.11524109 0.00670698]    f(xk)= [-1.46650132 -1.3147068 ]
k= 26    xk= [0.17879539 0.0246649 ]    f(xk)= [-1.12012011 -1.46057767]
k= 27    xk= [0.17505925 0.02929594]    f(xk)= [-1.5553635  -0.26996006]
k= 28    xk= [0.18893989 0.03342207]    f(xk)= [-1.4500936  -0.45524167]
k= 29    xk= [0.22546628 0.04688035]    f(xk)= [-1.19240743 -0.79093866]
k= 30    xk= [0.23877529 0.05259369]    f(xk)= [-1.10029946 -0.88399008]
k= 31    xk= [0.25428437 0.06010509]    f(xk)= [-1.02807924 -0.91109027]
k= 32    xk= [0.27116407 0.06871439]    f(xk)= [-0.9353492  -0.96311185]
k= 33    xk= [0.28827306 0.07809215]    f(xk)= [-0.84584567 -1.00184213]
k= 34    xk= [0.30453673 0.08760267]    f(xk)= [-0.76480491 -1.0279903 ]
k= 35    xk= [0.32072413 0.09764762]    f(xk)= [-0.68934851 -1.04326926]
k= 36    xk= [0.33639066 0.10790526]    f(xk)= [-0.6203388  -1.05068295]
k= 37    xk= [0.35201809 0.11866338]    f(xk)= [-0.55625337 -1.05067108]
k= 38    xk= [0.36724683 0.12964277]    f(xk)= [-0.49759833 -1.04549305]
k= 39    xk= [0.38246707 0.14110658]    f(xk)= [-0.44343805 -1.03489669]
k= 40    xk= [0.39728775 0.15273218]    f(xk)= [-0.39410312 -1.02107526]
k= 41    xk= [0.41215838 0.16485939]    f(xk)= [-0.34886931 -1.00302937]
k= 42    xk= [0.42650931 0.17699212]    f(xk)= [-0.30793999 -0.9836144 ]
k= 43    xk= [0.44109504 0.18976381]    f(xk)= [-0.27072811 -0.9602033 ]
k= 44    xk= [0.45477201 0.20212697]    f(xk)= [-0.23719208 -0.93812269]
k= 45    xk= [0.4681153  0.21458789]    f(xk)= [-0.21291526 -0.9088083 ]
k= 46    xk= [0.47908973 0.22505828]    f(xk)= [-0.1854592  -0.89373795]
k= 47    xk= [0.50398499 0.2497432 ]    f(xk)= [-0.13370942 -0.85153389]
k= 48    xk= [0.51439976 0.26044589]    f(xk)= [-0.11498655 -0.83224564]
k= 49    xk= [0.53852539 0.28603837]    f(xk)= [-0.06750659 -0.79424541]
k= 50    xk= [0.55785196 0.30748713]    f(xk)= [-0.05606785 -0.74233693]
k= 51    xk= [0.56092826 0.31093463]    f(xk)= [-0.0466508  -0.74117561]
k= 52    xk= [0.61261365 0.37369717]    f(xk)= [-0.38311202 -0.31966368]
k= 53    xk= [0.52465657 0.27398909]    f(xk)= [-0.68302112 -0.25508661]
k= 54    xk= [0.74386124 0.52529992]    f(xk)= [ 7.82778407 -5.60592561]
k= 55    xk= [0.54293777 0.30014958]    f(xk)= [-2.07995492  1.07363175]
k= 56    xk= [0.58511821 0.34990237]    f(xk)= [-2.59425851  1.50781064]
k= 57    xk= [0.35868093 0.08830817]    f(xk)= [ 4.50558888 -8.06876881]
k= 58    xk= [0.47983856 0.24586376]    f(xk)= [-4.0381075   3.12374291]
k= 59    xk= [0.3775732 0.1412489]    f(xk)= [-1.04660917 -0.26252449]
k= 60    xk= [0.36511619 0.13121294]    f(xk)= [-0.96352471 -0.41937734]
k= 61    xk= [ 0.1114212  -0.05351408]    f(xk)= [  1.16118684 -13.18575168]
k= 62    xk= [0.25437135 0.02356053]    f(xk)= [ 2.69511008 -8.22885014]
k= 63    xk= [0.16777908 0.02828527]    f(xk)= [-1.67353225  0.0270904 ]
k= 64    xk= [0.21546576 0.04636992]    f(xk)= [-1.56427885 -0.01111462]
k= 65    xk= [0.24867847 0.0590896 ]    f(xk)= [-1.22895955 -0.55027587]
k= 66    xk= [0.26241736 0.06557769]    f(xk)= [-1.13033017 -0.65703565]
k= 67    xk= [0.27594487 0.07265633]    f(xk)= [-1.06297502 -0.69784814]
k= 68    xk= [0.29626637 0.08383401]    f(xk)= [-0.94058127 -0.78794969]
k= 69    xk= [0.31324088 0.09384403]    f(xk)= [-0.83777351 -0.85516414]
k= 70    xk= [0.3305509  0.10477546]    f(xk)= [-0.74543613 -0.89768638]
k= 71    xk= [0.34571472 0.11489762]    f(xk)= [-0.66954478 -0.9242097 ]
k= 72    xk= [0.36107381 0.12568167]    f(xk)= [-0.60009902 -0.9385247 ]
k= 73    xk= [0.37526559 0.13609304]    f(xk)= [-0.53928208 -0.94624546]
k= 74    xk= [0.38966873 0.14710916]    f(xk)= [-0.48301011 -0.94651221]
k= 75    xk= [0.40322511 0.15787327]    f(xk)= [-0.43270865 -0.94344462]
k= 76    xk= [0.41704718 0.16925377]    f(xk)= [-0.3860966  -0.93491704]
k= 77    xk= [0.4300686  0.18033373]    f(xk)= [-0.34418922 -0.92505424]
k= 78    xk= [0.44351101 0.1921499 ]    f(xk)= [-0.30541382 -0.9104218 ]
k= 79    xk= [0.45596    0.20341705]    f(xk)= [-0.27054816 -0.89649514]
k= 80    xk= [0.46923896 0.2158001 ]    f(xk)= [-0.23845909 -0.87701904]
k= 81    xk= [0.48085841 0.22691624]    f(xk)= [-0.2095579  -0.86171446]
k= 82    xk= [0.49450418 0.24035192]    f(xk)= [-0.18369471 -0.83649135]
k= 83    xk= [0.50433095 0.25022809]    f(xk)= [-0.15987461 -0.82432328]
k= 84    xk= [0.52923466 0.27606143]    f(xk)= [-0.08884812 -0.80558078]
k= 85    xk= [0.54817807 0.29670844]    f(xk)= [-0.07244057 -0.75815082]
k= 86    xk= [0.55196258 0.30088075]    f(xk)= [-0.06107966 -0.7563875 ]
k= 87    xk= [0.58409242 0.3383646 ]    f(xk)= [-0.17778262 -0.55987076]
k= 88    xk= [0.57415074 0.32741847]    f(xk)= [-0.33941812 -0.44612013]
k= 89    xk= [0.5759845  0.32975938]    f(xk)= [-0.38752971 -0.39975147]
k= 90    xk= [0.54936923 0.30264256]    f(xk)= [-1.08497164  0.16720094]
k= 91    xk= [0.59588566 0.35541025]    f(xk)= [-0.88701248  0.06610646]
k= 92    xk= [0.62557699 0.38945577]    f(xk)= [-0.27571094 -0.37815896]
k= 93    xk= [0.62659512 0.39086428]    f(xk)= [-0.30639535 -0.35143459]
k= 94    xk= [0.60750698 0.37014954]    f(xk)= [-1.04859775  0.21696188]
k= 95    xk= [0.63679781 0.40648354]    f(xk)= [-0.97401477  0.19441837]
k= 96    xk= [0.67603801 0.4552325 ]    f(xk)= [-0.16255821 -0.35897816]
k= 97    xk= [0.67756175 0.457579  ]    f(xk)= [-0.23538087 -0.30218326]
k= 98    xk= [0.66719264 0.44635928]    f(xk)= [-0.98940486  0.24265117]
k= 99    xk= [0.68116519 0.46516345]    f(xk)= [-0.95848113  0.23548731]
k= 100    xk= [0.72613353 0.52567098]    f(xk)= [-0.08332021 -0.31978466]
k= 101    xk= [0.72763167 0.52819241]    f(xk)= [-0.17934169 -0.25108513]
k= 102    xk= [0.72088263 0.52080286]    f(xk)= [-0.88438909  0.22621877]
k= 103    xk= [0.72930705 0.53299447]    f(xk)= [-0.86394012  0.22113746]
k= 104    xk= [0.77143759 0.59395855]    f(xk)= [-0.09998023 -0.2314799 ]
k= 105    xk= [0.772421   0.59561435]    f(xk)= [-0.14005778 -0.20396923]
k= 106    xk= [0.7641847  0.58495593]    f(xk)= [-0.77048152  0.19553579]
k= 107    xk= [0.77477743 0.60123328]    f(xk)= [-0.74585811  0.19064376]
k= 108    xk= [0.81159937 0.65777517]    f(xk)= [-0.07866333 -0.1836731 ]
k= 109    xk= [0.81222326 0.65888019]    f(xk)= [-0.10705352 -0.16528704]
k= 110    xk= [0.80623427 0.65086575]    f(xk)= [-0.66231087  0.17040917]
k= 111    xk= [0.81361762 0.66280639]    f(xk)= [-0.64378248  0.16655104]
k= 112    xk= [0.84654107 0.71601533]    f(xk)= [-0.09817672 -0.12329062]
k= 113    xk= [0.85700521 0.73465556]    f(xk)= [-0.35373609  0.03952514]
k= 114    xk= [0.83758317 0.70213926]    f(xk)= [-0.52374073  0.1187387 ]
k= 115    xk= [0.90388038 0.81436146]    f(xk)= [ 0.76163845 -0.52765703]
k= 116    xk= [0.86767736 0.75453523]    f(xk)= [-0.84467781  0.33424436]
k= 117    xk= [0.90054052 0.81082101]    f(xk)= [-0.1440905  -0.03044198]
k= 118    xk= [0.90932277 0.82694243]    f(xk)= [-0.20846799  0.01490863]
k= 119    xk= [0.87018732 0.75667299]    f(xk)= [-0.06714585 -0.1105966 ]
k= 120    xk= [0.82135459 0.67121157]    f(xk)= [ 0.76362535 -0.68235826]
k= 121    xk= [0.86037593 0.74128628]    f(xk)= [-0.63700442  0.20790696]
k= 122    xk= [0.83873931 0.70465215]    f(xk)= [-0.71455427  0.23370366]
k= 123    xk= [1.03860171 1.04308697]    f(xk)= [14.86961426 -7.12131062]
k= 124    xk= [0.84811163 0.7215493 ]    f(xk)= [-1.06909728  0.45119093]
k= 125    xk= [0.86097865 0.74383034]    f(xk)= [-1.15489858  0.50922045]
k= 126    xk= [0.66388223 0.40558191]    f(xk)= [ 8.66399552 -7.03154158]
k= 127    xk= [0.83144109 0.69757602]    f(xk)= [-2.42627201  1.25634529]
k= 128    xk= [0.78711785 0.62687425]    f(xk)= [-2.73036387  1.46394824]
k= 129    xk= [1.21594817 1.31516318]    f(xk)= [ 79.89010983 -32.67335542]
k= 130    xk= [0.80183352 0.64738549]    f(xk)= [-1.82311642  0.88970056]
k= 131    xk= [0.81130929 0.66135898]    f(xk)= [-1.39515741  0.62724291]
k= 132    xk= [0.84379242 0.71145948]    f(xk)= [-0.13482093 -0.10523573]
k= 133    xk= [0.8488667  0.72108011]    f(xk)= [-0.47388572  0.1010872 ]
k= 134    xk= [0.83939571 0.70529745]    f(xk)= [-0.56036629  0.14245826]
k= 135    xk= [0.90713661 0.81941977]    f(xk)= [ 1.07593817 -0.69541067]
k= 136    xk= [0.86501385 0.74991553]    f(xk)= [-0.84661343  0.33331324]
k= 137    xk= [0.89854797 0.80733425]    f(xk)= [-0.18342061 -0.01084162]
k= 138    xk= [0.91010896 0.82821754]    f(xk)= [-0.15037671 -0.01615486]
k= 139    xk= [0.92061081 0.84725285]    f(xk)= [-0.05883515 -0.05428094]
k= 140    xk= [0.93145142 0.8677041 ]    f(xk)= [-0.17522727  0.02046812]
k= 141    xk= [0.90995165 0.82823094]    f(xk)= [-0.25978644  0.04378789]
k= 142    xk= [0.9791429  0.95553256]    f(xk)= [ 1.20698881 -0.63765106]
k= 143    xk= [0.92368943 0.85431748]    f(xk)= [-0.56470048  0.22306163]
k= 144    xk= [0.9419506 0.8881915]    f(xk)= [-0.46295033  0.18411344]
k= 145    xk= [0.9577381  0.91746347]    f(xk)= [-0.16160146  0.04023942]
k= 146    xk= [0.96701862 0.93506488]    f(xk)= [-0.04270618 -0.01202489]
k= 147    xk= [0.97141166 0.94372517]    f(xk)= [-0.0900311   0.01691066]
k= 148    xk= [0.96094308 0.92354637]    f(xk)= [-0.12991058  0.026951  ]
k= 149    xk= [0.99570993 0.99061839]    f(xk)= [ 0.3179641  -0.16397559]
k= 150    xk= [0.97199417 0.94517136]    f(xk)= [-0.21102069  0.07973764]
k= 151    xk= [0.98787082 0.97590466]    f(xk)= [-0.03054315  0.00318098]
k= 152    xk= [0.99093975 0.98195051]    f(xk)= [-0.01372648 -0.00221709]
k= 153    xk= [0.99398549 0.98801939]    f(xk)= [-0.016895    0.00244771]
k= 154    xk= [0.95542339 0.91169344]    f(xk)= [ 0.3466759  -0.22808167]
k= 155    xk= [0.99166933 0.98353597]    f(xk)= [-0.06740292  0.02558392]
k= 156    xk= [0.9849434  0.97039446]    f(xk)= [-0.14080314  0.05619102]
k= 157    xk= [0.99794854 0.99583192]    f(xk)= [ 0.02359149 -0.01387567]
k= 158    xk= [0.99621505 0.99248275]    f(xk)= [-0.02284649  0.00766731]
k= 159    xk= [0.99710996 0.99424387]    f(xk)= [-0.0119997   0.00311882]
k= 160    xk= [0.99816951 0.99634977]    f(xk)= [-0.00661416  0.0014793 ]
k= 161    xk= [0.99951767 0.99903483]    f(xk)= [-0.00066891 -0.00014795]
k= 162    xk= [0.99970257 0.99940648]    f(xk)= [-0.001095    0.00025015]
k= 163    xk= [0.99900072 0.99800724]    f(xk)= [-0.00391391  0.00095864]
k= 164    xk= [0.99997795 0.99995574]    f(xk)= [ 1.74913071e-05 -3.08012179e-05]
k= 165    xk= [0.99997761 0.99995513]    f(xk)= [-8.70938135e-06 -1.80379225e-05]
k= 166    xk= [0.99997777 0.99995565]    f(xk)= [-9.03605856e-05  2.29477018e-05]
k= 167    xk= [0.99997745 0.99995502]    f(xk)= [-9.16328188e-05  2.32707922e-05]
k= 168    xk= [1. 1.]    f(xk)= [ 2.02059708e-07 -1.00387831e-07]

Jacobian after first solve:
[[ 116.48889476  -56.35457991]
 [-225.91025031  112.73024426]]

second solve:
k= 1    xk= [0.56260435 1.12745333]    f(xk)= [-183.36781519  162.18593492]
k= 2    xk= [1. 1.]    f(xk)= [ 9.83445546e-08 -5.00097297e-08]

Jacobian after second solve:
[[ 380.39702492 -133.25513499]
 [-329.18184311  142.82269653]]

You can see it takes 168 iterations the first time, and only 2 iterations the second time when re-using the Jacobian from the first time.

I found that using the "Wolfe conditions" linesearch, and using "Broyden's first method" (rather than Broyden's second method) tends to make the solver more robust here.

Nick Alger
  • 984
  • 7
  • 26
0

I think it should be possible using a decorator. See below. Depending on the tolerance or whether the gradient function is being called the first time, the gradient is either recomputed or returned from cache.

import numpy as np
from scipy.optimize import minimize

aa = 3.14

#decorator
def smartgrad(func):
    def wrapped(*args,**kwargs):
        distance = abs(args[0]-wrapped.oldstate)
        if ( wrapped.firstcall == True) or (distance > wrapped.tolerance):
            print('recomputing...')
            newgrad = func(*args,**kwargs)
            wrapped.firstcall = False
            wrapped.oldgrad   = newgrad
            wrapped.oldstate  = args[0]
            return newgrad
        else:
            print('returning from cache...')
            return wrapped.oldgrad
    
    wrapped.firstcall = True
    wrapped.tolerance = 1E-1
    wrapped.oldgrad   = 0.0
    wrapped.oldstate  = 0.0
    
    return wrapped


def ff(xx):
    return (xx-aa)**2

@smartgrad
def grad(xx):
    return 2*(xx-aa)

if __name__ =='__main__':
    gg=grad(1.0);       # gradient is recomputed 
    print(gg)
    gg=grad(1.05)       # gradient is returned from cache
    print(gg)
    gg=grad(1.11)       # gradient is recomputed
    print(gg)
NNN
  • 697
  • 1
  • 5
  • 15
  • looks interesting. But I am wondering how to use the genuine grad calculation of scipy once the very first step has been done. I think it using some kind of internal method or object but I can not find it . (this is to replace your grad function with a numerical procedure) – Dave Dec 24 '20 at 10:26
  • What do you mean 'genuine grad calculation of scipy'? Do you mean finite difference calculation? – NNN Dec 24 '20 at 10:28
  • yes , after the initial step, I want to revert to what scipy does . I do not know exactly what scipy does but I think you are right about finite difference. – Dave Dec 25 '20 at 13:56