How to get errors of parameters from maximum likelihood estimation with known likelihood function in python?

Question

I have a some data and want to fit a given psychometric function p. I'm intereseted in the fit parameters and the errors as well. With the 'classical' method using the curve_fit function from the scipy package it's easy to get the parameters of p and the errors. However I want to do the same using a maximum likelihood estimation (MLE). From the output and the figure you can see that both methods offer slight different parameters. Implementing the MLE is not the problem but I don't know how to get the errors using this method. Is there an easy way to get them? My likelihood function L is: I was not able to adapt the code described here http://rlhick.people.wm.edu/posts/estimating-custom-mle.html but this is probably a solution. How can I implement this? Or this there any other way?

A similar function is fitted here using scipy stats models: https://stats.stackexchange.com/questions/66199/maximum-likelihood-curve-model-fitting-in-python. However the errors of the parameters are not calculated neither.

The negative log-likelihood function is correct, since it offers the right parameters, but I was wondering if this function depends on y-data? The negative log likelihood function l is obviously l = -ln(L). Here is my code:

#!/usr/bin/env python
# -*- coding: utf-8 -*- 

## libary
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import minimize


def p(x,x50,s50):
    """return y value of psychometric function p"""
    return 1./(1+np.exp(4.*s50*(x50-x)))

def initialparams(x,y):
    """return initial fit parameters for function p with given dataset"""
    midpoint = np.mean(x)
    slope = (np.max(y)-np.min(y))/(np.max(x)-np.min(x))
    return [midpoint, slope]

def cfit_error(pcov):
    """return errors of fir from covariance matrix"""
    return np.sqrt(np.diag(pcov))

def neg_loglike(params):
    """analytical negative log likelihood function. This function is dependend on the dataset (x and y) and the two parameters x50 and s50."""
    x50 = params[0]
    s50 = params[1]
    i = len(xdata)
    prod = 1.
    for i in range(i):
        #print prod
        prod *= p(xdata[i],x50,s50)**(ydata[i]*5) * (1-p(xdata[i],x50,s50))**((1.-ydata[i])*5)
    return -np.log(prod)


xdata = [0.,-7.5,-9.,-13.500001,-12.436171,-16.208617,-13.533123,-12.998025,-13.377527,-12.570075,-13.320075,-13.070075,-11.820075,-12.070075,-12.820075,-13.070075,-12.320075,-12.570075,-11.320075,-12.070075]
ydata = [1.,0.6,0.8,0.4,1.,0.,0.4,0.6,0.2,0.8,0.4,0.,0.6,0.8,0.6,0.2,0.6,0.,0.8,0.6]

intparams = initialparams(xdata, ydata)## guess some initial parameters


## normal curve fit using least squares algorithm
popt, pcov = curve_fit(p, xdata, ydata, p0=intparams)
print('scipy.optimize.curve_fit:')
print('x50 = {:f} +- {:f}'.format(popt[0], cfit_error(pcov)[0]))
print('s50 = {:f} +- {:f}\n'.format(popt[1], cfit_error(pcov)[1]))



## fitting using maximum likelihood estimation
results = minimize(neg_loglike, initialparams(xdata,ydata), method='Nelder-Mead')
print('MLE with self defined likelihood-function:')
print('x50 = {:f}'.format(results.x[0]))
print('s50 = {:f}'.format(results.x[1]))
#print results


## ploting the data and results
xfit = np.arange(-20,1,0.1)

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(xdata, ydata, 'xb', label='measured data')
ax.plot(xfit, p(xfit, *popt), '-r', label='curve fit')
ax.plot(xfit, p(xfit, *results.x), '-g', label='MLE')
plt.legend()
plt.show()

The output is:

scipy.optimize.curve_fit:
x50 = -12.681586 +- 0.252561
s50 = 0.264371 +- 0.117911

MLE with self defined likelihood-function:
x50 = -12.406544
s50 = 0.107389

Both fits and measured data can be seen here: My Python version is 2.7 on Debian Stretch. Thank you for your help.

My advice is to work with a symbolic computation package to work out the log likelihood and formulas for uncertainty of estimates (via Hessian of likelihood function or something like that). Typically you can tell such packages to print a formula in a form which is easily parsed as a snippet of some other language (typically C or Fortran or some other ordinary language), so you can paste your derivations into another program. I work with Maxima (http://maxima.sourceforge.net) a lot, but you can also try Sympy (http://sympy.org), Sage, Axiom, or other systems. — Robert Dodier, Sep 05 '18 at 20:39

Lukas · Accepted Answer · 2018-09-16T11:21:26.227

Finally the method described by Rob Hicks (http://rlhick.people.wm.edu/posts/estimating-custom-mle.html) worked out. After installing numdifftools, I could calculate the errors of estimated parameters from the hessian matrix.

Installing numdifftools on Linux with su rights:

apt-get install python-pip
pip install numdifftools

An complete code example of my programm from above is here:

#!/usr/bin/env python
# -*- coding: utf-8 -*- 

## libary
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import minimize
import numdifftools as ndt



def p(x,x50,s50):
    """return y value of psychometric function p"""
    return 1./(1+np.exp(4.*s50*(x50-x)))

def initialparams(x,y):
    """return initial fit parameters for function p with given dataset"""
    midpoint = np.mean(x)
    slope = (np.max(y)-np.min(y))/(np.max(x)-np.min(x))
    return [midpoint, slope]

def cfit_error(pcov):
    """return errors of fir from covariance matrix"""
    return np.sqrt(np.diag(pcov))

def neg_loglike(params):
    """analytical negative log likelihood function. This function is dependend on the dataset (x and y) and the two parameters x50 and s50."""
    x50 = params[0]
    s50 = params[1]
    i = len(xdata)
    prod = 1.
    for i in range(i):
        #print prod
        prod *= p(xdata[i],x50,s50)**(ydata[i]*5) * (1-p(xdata[i],x50,s50))**((1.-ydata[i])*5)
    return -np.log(prod)


xdata = [0.,-7.5,-9.,-13.500001,-12.436171,-16.208617,-13.533123,-12.998025,-13.377527,-12.570075,-13.320075,-13.070075,-11.820075,-12.070075,-12.820075,-13.070075,-12.320075,-12.570075,-11.320075,-12.070075]
ydata = [1.,0.6,0.8,0.4,1.,0.,0.4,0.6,0.2,0.8,0.4,0.,0.6,0.8,0.6,0.2,0.6,0.,0.8,0.6]



intparams = initialparams(xdata, ydata)## guess some initial parameters


## normal curve fit using least squares algorithm
popt, pcov = curve_fit(p, xdata, ydata, p0=intparams)
print('scipy.optimize.curve_fit:')
print('x50 = {:f} +- {:f}'.format(popt[0], cfit_error(pcov)[0]))
print('s50 = {:f} +- {:f}\n'.format(popt[1], cfit_error(pcov)[1]))



## fitting using maximum likelihood estimation
results = minimize(neg_loglike, initialparams(xdata,ydata), method='Nelder-Mead')
## calculating errors from hessian matrix using numdifftools
Hfun = ndt.Hessian(neg_loglike, full_output=True)
hessian_ndt, info = Hfun(results.x)
se = np.sqrt(np.diag(np.linalg.inv(hessian_ndt)))

print('MLE with self defined likelihood-function:')
print('x50 = {:f} +- {:f}'.format(results.x[0], se[0]))
print('s50 = {:f} +- {:f}'.format(results.x[1], se[1]))

Generates the following output:

scipy.optimize.curve_fit:
x50 = -18.702375 +- 1.246728
s50 = 0.063620 +- 0.041207

MLE with self defined likelihood-function:
x50 = -18.572181 +- 0.779847
s50 = 0.078935 +- 0.028783

However some RuntimeErrors occur in calculating the hessian matrix with numdifftools. There is some Division by Zero. This is maybe because of my self defined neg_loglike funtion. At the end there some results for the errors. The method using "Extending Statsmodels" is probably more elegant, but I couldn't figure it out.

How to get errors of parameters from maximum likelihood estimation with known likelihood function in python?

1 Answers1