Scipy.Optimize.Minimize inefficient? Double calls to cost/gradient function

Question

I'm relatively new to using SciPy; I'm currently using it to minimize a cost function for a multi-layer-perceptron model. I can't use scikit-learn because I need to have the ability to set the coefficients (they are read-only in the MLPClassifer) and add random permutations and noise to any and all parameters. I haven't finished the implementation quite yet, but I am confused about the parameters required for the minimize function.

For example, I have a function that I have written to calculate the "cost" (energy to minimize) of the function, and it calculates the gradient at the same time. That's nothing special as it's common practice. However, when calling scipy.optimize.minimize, it asks for two different functions: one that returns the scalar that is to be minimized (i.e., the cost in my case) and one that calculates the gradient of the current state. Example:

j,grad = myCostFunction(X,y)

Unless I am mistaken, it seems that it would need to call my function twice, with each call needing to be specified to return either the cost or the gradient, like so:

opt = scipy.optimize.minimize(fun=myJFunction, jac=myGradFunction, args = args,...)

Isn't this a waste of computation time? My data set will be > 1 million samples and 10ish features, so reducing redundant computation would be preferred since I will be training and retraining this thing tens of thousands of times for my project.

Another point of confusion is with the args input. Are the arguments passed like this:

# This is what I expect happens
myJFunction(x0,*args)
myGradFunction(x0,*args)

or like this:

# This is what I wish it did
myJFunction(x0,arg0,arg1,arg2)
myGradFunction(x0,arg3,arg4,arg5)

Thanks in advance!

Yes, there is a cost to calculating the `jac`, but it may be worth it if it doesn't have to call the `fun` as many times. Using the gradient allows it to take bigger steps. Those kinds of trade offs have been studied at the theoretical level as well as in coding. — hpaulj, Oct 09 '19 at 16:59
The two definitions do the same thing. As long as you are using 3 `args` elements, `fun(x0, *args)` and `fun(x0, a,b,c)` will both work. You can use `a,b,c = args` to unpack the `*args` parameter. — hpaulj, Oct 09 '19 at 17:05
@hpaulj Thanks for your response. To clarify, are you saying that if I specify `jac` that it will not call `fun` as often? I had assumed it would call both for every iteration. If that is the case, then I don't have to worry as much as I thought. I guess perusing the source code would help clarify things more. — ddjanke, Oct 09 '19 at 17:34
It has to be called for each iteration, but, possibly, it won't have to do as many iterations. You may have to play around with the parameters to find which combination is optimal for your problem. — hpaulj, Oct 09 '19 at 18:22

ddjanke · Answer 1 · 2019-10-15T14:51:44.060

After doing some experimentation and searching, I found the answers to my own questions.

While I can't say for sure about the scipy.optimize.minimize function, using other optimization functions (for example, scipy.optimize.fmin_tnc) explicitly states that the callable function func can either (1) return both the energy and the gradient, (2) return the energy and specify the gradient function for that parameter fprime (slower), or (3) return only the energy and have the function estimate the gradient through perturbation (much slower).

See the docs here: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.optimize.fmin_tnc.html

I was very happy to see that I could use only one function to return both parameters. I assume it is the same case for the minimize function, but I have not tested it to be sure (See Edit 1)

As for my second question, if you specify two different functions, the *args parameters are passed to both functions the same; you cannot specify individual parameters for both.

EDIT 1: Reading through the minimize documentation more, if the parameter jac is set to True, then the optimizer assumes that the func returns energy and gradient. Reading the docs thoroughly is helpful, it seems.

the version of the docs you listed is an old, it is unlikely you are using that version of scipy. If you run `import scipy; scipy.__version__` you will see which version and can navigate to the appropriate version of the docs. I think 1.4.x is latest although 1.5.x should be released soon. — Lucas Roberts, Jun 20 '20 at 02:46

Scipy.Optimize.Minimize inefficient? Double calls to cost/gradient function

1 Answers1