I'm relatively new to using SciPy; I'm currently using it to minimize a cost function for a multi-layer-perceptron model. I can't use scikit-learn because I need to have the ability to set the coefficients (they are read-only in the MLPClassifer) and add random permutations and noise to any and all parameters. I haven't finished the implementation quite yet, but I am confused about the parameters required for the minimize function.
For example, I have a function that I have written to calculate the "cost" (energy to minimize) of the function, and it calculates the gradient at the same time. That's nothing special as it's common practice. However, when calling scipy.optimize.minimize, it asks for two different functions: one that returns the scalar that is to be minimized (i.e., the cost in my case) and one that calculates the gradient of the current state. Example:
j,grad = myCostFunction(X,y)
Unless I am mistaken, it seems that it would need to call my function twice, with each call needing to be specified to return either the cost or the gradient, like so:
opt = scipy.optimize.minimize(fun=myJFunction, jac=myGradFunction, args = args,...)
Isn't this a waste of computation time? My data set will be > 1 million samples and 10ish features, so reducing redundant computation would be preferred since I will be training and retraining this thing tens of thousands of times for my project.
Another point of confusion is with the args input. Are the arguments passed like this:
# This is what I expect happens
myJFunction(x0,*args)
myGradFunction(x0,*args)
or like this:
# This is what I wish it did
myJFunction(x0,arg0,arg1,arg2)
myGradFunction(x0,arg3,arg4,arg5)
Thanks in advance!