Deductible modeling -- Difficulty achieving convergence with scipy.optimize.minimize

Question

I am new to Python. I apologize in advance if my code feels cringy. For the past few weeks I have been working on a difficult statistical challenge called deductible modeling. I will try avoiding unnecessary statistical jargon and keeping the problem simple, as, as I understanding it, my issue is a programming/optimization one, not a statistical one.

PLEASE MOVE THIS THREAD TO A MORE APPROPRIATE STACK EXCHANGE SITE IF YOU THINK IT SHOULD

I have 8 parameters, 2 of which are constrained and should be positive (phi_freq and phi_sev). I am trying to maximize a log-likelihood function which is basically a non-convex, multi-modal, continuous, real-valued, not differentiable (AFAIK) function of all those parameters. Yikes!

I know such problems mean the seed values provided to the search algorithm will have a tremendous impact on the local/global optima we converge to, but as far as I know, my starting values are solid and even-though hardcoded (in the sample provided below), were the result of an auxiliary optimization and not just subjective hunches.

I have tried using the Nelder-Mead and SLSQP methods (the SLSQP is commented out) of the scipy.optimize.minimize library, but the values returned are awkward and nonsensical.

The following is a MWE:

from scipy.stats import gamma
from scipy import special
import pandas as pd
import numpy as np
import scipy as s

def freq_mean(intercept_freq, beta_veh_age_log, beta_dr_section_b):
    return np.exp(intercept_freq + beta_veh_age_log * freq_data['veh_age_log'] + beta_dr_section_b * freq_data['dr_section_b'])

def sev_mean(intercept_sev, alpha_veh_age_log, alpha_acv_log, df):
    return np.exp(intercept_sev + alpha_veh_age_log * df['veh_age_log'] + alpha_acv_log * df['acv_log'])

def freq_dist(x, mu, phi):
    return (sp.special.gamma(phi + x) / sp.special.gamma(phi) / sp.special.factorial(x)) * ((phi  / (phi + mu)) ** phi) * ((mu / (phi + mu)) ** x)

def sev_dist(x, mu, phi, ded):
    gamma_pdf = (((phi / mu) ** phi) / sp.special.gamma(phi)) * (x ** (phi - 1.0)) * np.exp(-phi * x / mu)
    gamma_sdf = 1.0 - sp.stats.gamma.cdf(x = phi * ded / mu, a = phi, scale = 1.0)

    if gamma_sdf == 0.0:
        return 0.0
    else:
        return gamma_pdf / gamma_sdf

def sev_loglik(intercept_sev, alpha_veh_age_log, alpha_acv_log, phi_sev):
    sev_data['mu_sev'] = sev_mean(intercept_sev, alpha_veh_age_log, alpha_acv_log, sev_data)
    return sev_data.apply(lambda x : np.log(sev_dist(x['claim_amount'], x['mu_sev'], phi_sev, x['deductible'])), axis = 1).sum()

def freq_loglik(intercept_freq, beta_veh_age_log, beta_dr_section_b, phi_freq, intercept_sev, alpha_veh_age_log, alpha_acv_log, phi_sev):
    freq_data['mu_sev'] = sev_mean(intercept_sev, alpha_veh_age_log, alpha_acv_log, freq_data)
    freq_data['claim_prob'] = 1.0 - sp.stats.gamma.cdf(x = phi_sev * freq_data['deductible'] / freq_data['mu_sev'], a = phi_sev, scale = 1.0)
    freq_data['mu_freq'] = freq_mean(intercept_sev, beta_veh_age_log, beta_dr_section_b)
    return freq_data.apply(lambda x : np.log(freq_dist(x['claim_count'], x['exposure'] * x['claim_prob'] * x['mu_freq'], x['exposure'] * phi_freq)), axis = 1).sum()

def obj_func(param):
    intercept_freq, beta_veh_age_log, beta_dr_section_b, phi_freq, intercept_sev, alpha_veh_age_log, alpha_acv_log, phi_sev = param
    ll_sev = sev_loglik(intercept_sev, alpha_veh_age_log, alpha_acv_log, phi_sev)
    ll_freq = freq_loglik(intercept_freq, beta_veh_age_log, beta_dr_section_b, phi_freq, intercept_sev, alpha_veh_age_log, alpha_acv_log, phi_sev)
    return -(ll_freq + ll_sev)

def mle(nfev = 100):    
    intercept_freq, beta_veh_age_log, beta_dr_section_b, phi_freq, intercept_sev, alpha_veh_age_log, alpha_acv_log, phi_sev = (-1.87515443, -0.5675389200, -0.0216802900, 23.78568667, 10.42040743, 0.00465891, 0.00072216, 0.69497075)
    seeds = np.array([intercept_freq, beta_veh_age_log, beta_dr_section_b, phi_freq, intercept_sev, alpha_veh_age_log, alpha_acv_log, phi_sev])
    return sp.optimize.minimize(fun = obj_func, x0 = seeds, method = 'Nelder-Mead', options = {'maxfev' : nfev})
    #cons = ({'type': 'ineq', 'fun' : lambda x : x[3]}, {'type': 'ineq', 'fun' : lambda x : x[7]})
    #return sp.optimize.minimize(fun = obj_func, x0 = seeds, method = 'SLSQP', constraints = cons)

policies = pd.read_csv('C:/policies.txt', sep = '\t')
claims = pd.read_csv('C:/claims.txt', sep = '\t')

freq_data = pd.merge(left = policies, right = claims.groupby('ID').agg(claim_count = ('claim_amount', 'count')), how = 'left', on = 'ID')
freq_data['claim_count'].fillna(0, inplace = True)
sev_data = pd.merge(left = claims[['ID', 'claim_amount']], right = policies.drop(['dr_section_b', 'exposure'], axis = 1), how = 'left', on = 'ID')

opt = mle(4000)

I am unsure how and if providing data is necessary at the moment (the policies and claims flat files referenced in the sample code). I am prepared to do so, but would need anonymizing them first.

So, I guess at this point any pointers would be welcomed, as I am close to hitting a wall. There is a global solution, there must be, meaning the mle (maximum likelihood estimators) must exist. My seed values are very realistic, as they were obtained through moment matching (the so-called method of moments estimators). I feel there must be something else I'm doing wrong. Also, as lame as that may sound, I even reproduced the exact same problem using Excel's solver, and met similar numerical convergence issues. I am happy to provide supplementary details, technical or not, regarding this problem.

There are a few other methods to try. Also turn on the log (option disp = True). Obviously we cannot reproduce this, so our ability to help is limited. In general for these type of problems you need to worry about several things: (1) Scaling (2) initial point (3) bounds (4) exact and correct gradients (depending on the solver). — Erwin Kalvelagen, Dec 16 '19 at 04:24
@ErwinKalvelagen Do you mean by that that there's no obvious mistake in my code? Also, what does the log show? Intermediary values of the parameters tested? Finally, the reason I'm resorting to the Nelder-Mead method is that I'm unable to compute the gradient in a closed form. The objective function involves the incomplete gamma function evaluated at a function of alpha, for shape parameter alpha. As far I know, there is no obvious way of differentiating that with respect to alpha. — RSMax, Dec 17 '19 at 00:32
It is probably somewhat unrealistic to expect that casual reading of unfamiliar code (that one cannot execute) will find anything but the most glaring mistakes. For gradients of incgamma I use: R. J. Moore, Algorithm AS 187: Derivatives of the Incomplete Gamma Integral, Applied Statistics 31 (1982), no. 3, 330–335 — Erwin Kalvelagen, Dec 17 '19 at 01:24

score 1 · Answer 1 · answered Dec 18 '19 at 06:30

Scipy contains a number of optimization algorithms, and for lack of a better suggestion, why don’t you try them? The fact that you can’t compute your gradient analytically is irrelevant, scipy will do that for you numerically.

Specifically, you may want to try and use BFGS and, if this also fails, then either there is a mistake in your code (which we can’t easily find as it is not runnable as it stands) or your objective function is much tougher and it needs a more powerful - although slower - global optimizer. Scipy comes with quite a few of them (differential evolution, basin hopping, SHGO, dual annealing) and the web is full of other alternative global optimization routines for Python.

Deductible modeling -- Difficulty achieving convergence with scipy.optimize.minimize

1 Answers1