I am trying to figure out how the scipy.optimize.minimize works in machine learning. So far i understand that you pass it a loss function, so that it can find the parameter values that gives you the lowest loss. But as far as i understand, it first has to find the gradient/Hessian of the loss function before it can find the minima. How does it do that? How does it know how to take the derivatives of a python function, which contains other function calls and algorithms inside it? How can it deduce that to a mathematical function, that it can take the derivate of? In my case the loss function that i pass into minimize does the following: First it solves a epidemiological model (SEIRD), that is meant to predict confirmed cases and deaths of COVID-19. Then it compares that models results to the actual data, and finds the MSLE, which it returns. The code looks like this:
def model_decay(params, data, population, return_solution=False, full_validation=True, forecast_days=0):
R_0, theta, k, L, R_t, D_rec, D_inc = params # R_0, fatality rate, k form, L shape, R end
N = population # Population of each country
n_infected = data['ConfirmedCases'].iloc[0]
n_dead = data['Fatalities'].iloc[0]
max_days = len(data) + forecast_days
rho = 1.0 / D_rec
alpha = 1.0 / D_inc
beta = R_0 * rho
y0 = N - n_infected, 0, n_infected, 0 ,0
def beta(t):
return ((R_0-R_t) / (1 + (t/L)**k)+R_t) * rho
t= np.arange(max_days)
# Solve the SEIR differential equation.
sol = solve_ivp(seir_d, [0, max_days],y0, args=(N, beta, rho, alpha, theta),t_eval=t)
sus, exp, inf, rec, dead = sol.y
#model_decay.counter +=1
#print(model_decay.counter)
# Predict confirmedcases
y_pred_cases = np.clip((inf + rec+dead),0,np.inf)
y_true_cases = data['ConfirmedCases'].values
# Predict fatalities
y_pred_dead = np.clip((dead),0,np.inf)
y_true_dead = data['Fatalities'].values
#Thanks to anjum-48 for the following lines of code in this function.
optim_days = min(20, len(data)) # Days to optimise for finds the lowest num.
weights = 1 / np.arange(1, optim_days+1)[::-1] # Recent data is more heavily weighted
# using mean squre log error to evaluate
msle_conf = mean_squared_log_error(y_true_cases[-optim_days:], y_pred_cases[-optim_days:],weights)
msle_dead = mean_squared_log_error(y_true_dead[-optim_days:], y_pred_dead[-optim_days:],weights)
if full_validation == True :
msle = np.mean([msle_conf,msle_dead])
else :
msle = msle_conf
if return_solution:
return msle, sol
else:
return msle
And the minimize call looks like this:
model_decay_res = minimize(model_decay, [1.8, 0.04, 2, 20, 1 ,15, 4.9],
bounds=((0, 6.49), (0.01, 0.15), (0, 5), (0, 200),(0,6.49),(3.48,18),(4.9,5.9)),
args=(train_data, population, False,full_validation),
method='L-BFGS-B')
It gives me a very low MSLE in the end so it works pretty well, i just dont understand how exactly.