As this post is long, I will put my questions here:
1. Is there a package in python that will give me the maximum likelihood estimator parameters, for a given number of parameters p, for the covariates x and the data values y? (Preferably with comprehensive documentation on how to implement it)
2. Is this method scalable i.e. will work if I try O(100) covariates and similar number of data samples?
Background:
I am trying to investigate things like the distribution of the maximum likelihood estimators with varying number of samples n /covariates p using python. My script generates the data for logistic regression just fine, but I have been unable to get any method of parameter estimation (i.e. the parameter values maximising the log likelihood) to work correctly.
Approaches I have tried:
-coding up my own version of Newton Raphson procedure. But the errors in my estimates were diverging on repeated iterations (I checked for obvious sign and inequality errors of course!)
-using Newton conjugate gradient implementation. This also failed, because of the error 'a truth value of an array is ambiguous'. When I was coding up my own version, I could use all() to cure this, but not when using a package. Strange because I thought Newton CG was meant to work for multivariate case!
-Finally, I am tying to use the package statsmodels. I don't know if I am being really obtuse, but I can't find any comprehensive documentation on this? For getting logistic regression paramters, the best I have found is this, from which I have tried e.g.
X,y = logit_data(np.power(10,6),p,theta)
y=np.reshape(y, (len(y),))
clf = LogisticRegression(random_state=0, solver='lbfgs', multi_class='multinomial').fit(X, y)
thetaEst = clf.get_params(X, y)
and I have tried with the last line as just:
thetaEst = clf.get_params()
But nothing seems to give me estimates for parameter values. I either have errors or an object that I don't expect. Surely there is a python package that should do this?!? Surely I shouldn't have to resort to R (I don't know R D: !!!)
Optional Code
I don't want to make this post longer, but I'm sure it will be asked for. So I have put in my code for implementing Newton Raphson, and I have been substituing this function with existing packages:
#Script for producing y data from p covariates from a specified distribution, specified theta paraemters,
#and n data samples for logit link function.
import numpy as np
import matplotlib.pyplot as plt
#Define link function here
def g(z):
g=1/(1+np.exp(-z))
return g
#For producing y data values given true paramters theta and number of covariates
def logit_data(n,p, theta):
#Define parameters
#1)Number of covariates
p_i = p+1 #with intercept
p_i=np.int(p_i)
#2) m as correct data type
n=np.int(n)
#4)Specify parameter valueas to be estimated
theta=np.reshape(theta, (p_i,1))
#5)Define distribution from which covariate values are drawn i.i.d., and initiate data values
X=np.zeros((n,p_i))
X[:,0]=1 #intercept
mean=0
sigma=1.5
X[:,1:]=np.random.normal(mean,sigma,(n,p))
#6)Produce y values treating y as a Bernoulli variable with p=g(X*theta)
r=np.random.uniform(0,1,n)
r=np.reshape(r, (len(r),1))
htrue=g(X.dot(theta))
y=htrue-r
y[y>=0]=1
y[y<0]=0
return X, y
#Newton Raphson implementation
def NewtonRaphson(X,y):
##NOTE: All functions negloglikelihood, gradf, hessian, return the values for f = (-ve of the log likelihood function)
#, to use NR method to minimise f (rather than maximise l)
#Define log likelihood function to be maximised
def negloglikelihood(y,h):
l= y.transpose() @ np.log(h) + (1-y).transpose() @ np.log(1-h)
f=-l
return f
#Define gradient of log likelihood function
def gradf(y, h, X):
a=(y-h).transpose()
gradl= np.matmul(a,X)
grad_f=-gradl
return grad_f
#Define second derivative (Hessian) of log likelihood function
def hessian(h, X):
D=np.identity(len(h))*(np.matmul(h,(1-h).transpose()))
H=-X.transpose() @ D @ X
Hf=-H
return Hf
#Minimise f=-l
#Produce initial theta estimate and probability parameter h
np.random.seed(555)
thetaEst=np.random.normal(1.1, 0.4, 6)
eta=np.matmul(X,thetaEst)
h=g(eta)
#While not at a manimum of f
#control constants
a=10e-8
b=10e-8
i=0
j=0
k=0
while not (np.linalg.norm(gradf(y,h,X)) < np.absolute(negloglikelihood(y,h)) * a + b):
i=i+1
#print('i = %s' %i)
h=g(np.matmul(X,thetaEst))
H=hessian(h,X) #Cholesky decomposition to ensure hessian (of f) is positive semi-definite)
# print(H)
try:
np.linalg.cholesky(H)
#print('j = %s' %j)
j=j+1
except np.linalg.LinAlgError:
print('Hessian not positive semi-definite!')
try:
v,w=np.linalg.eig(H)
# print(v,w)
v=np.absolute(v)
H=w @ np.diag(v) @ np.linalg.inv(w)
except:
return thetaEst
delta = 0
try:
delta=np.linalg.solve(H, np.reshape(gradf(y,h,X),(6,1))) #Solve for incremental theta step
#print('k = %s' %k)
k=k+1
except:
return thetaEst #Simply return theta estimate if have singular hessian
while negloglikelihood(y, h) > negloglikelihood(y, g(np.matmul(X,thetaEst+delta))):
print('!!')
delta=0.5*delta #Ensure added step is sufficently small so as not to diverge
thetaEst=thetaEst+delta
return thetaEst
#Main control
#1)Sample numbers to test for erros in beta, as powers of 10.
npowers=np.arange(1,2,0.05)
n=np.power(10,npowers)
#2)Number of independent covariates
p=5
#3)True theta to be estimated (parameter values)
theta=np.asarray([1,1.2,1.1,0.8,0.9,1.3])
#4)#Initiate arrays to store estimates of theta (and errors) computed at specified sample numbers N
Thetas=np.zeros((len(npowers),p+1))
Errors=np.zeros((len(npowers),p+1))
#5)Obtain random covariate values from specified distribution, and corresponding y values using true theta
#plus gaussian noise term.
X,y = logit_data(n[-1],p,theta)
#6)Calulcate cumulative means for given n values, for the theta estimates
for ind,N in enumerate(n):
N=np.int(N)
thetaTemp=NewtonRaphson(X[0:N,:],y[0:N])
Thetas[ind,:] = np.reshape(thetaTemp,6)
#7)Calculate true erros
#print(Thetas)
Errors=Thetas-theta.transpose()
absError=np.abs(Errors)
nerror=Errors*np.sqrt(n)[:,np.newaxis]
logerror = np.log10(absError)
#8)Save data as csv
#9)Plots
plt.scatter(X@theta, g(X@theta))
plt.scatter(X@theta,y)
plt.show()
fig=plt.figure()
for i in range(p+1):
plt.plot(npowers, logerror[:,i])
fig.suptitle('log10(Absolute Error) in MLE against log10(Number of samples,N) for logistic regression')
plt.xlabel('log_10(N)')
plt.ylabel('log_10(Absolute Error)')
fig.savefig('logiterrors7.png')
plt.show()