The code below provides a complete reproducible example. My question is (mainly) on my function thetaMax()
. This is a function which minimizes a log-likelihood of a psychometric process.
I am learning Python and doing so by translating my R functions to python. The code below works as expected. However, because I am learning Python, but my question is on style and quality.
The function thetaMax
would run over hundreds of thousands of individuals and in my R code is efficient and is split among multiple cores. But, before thinking about parallel processing, my first goal is make the python code as fast and efficient as possible.
There are probably many parts of my function thetaMax
that can be improved, but one aspect I am most concerned about is:
for i in range(0,len(x2)):
result[i] = (gpcm(theta, d = d[i], score = x2[i], a = 1, D = D))
I think doing this as a loop is probably bad and could be improved with some form of vectorization. Below are the complete contents necessary to implement this code, and thanks to anyone willing to offer suggestions on how to improve the code.
import numpy as np
from scipy.stats import binom
from scipy.optimize import minimize
def prob3pl(theta, a, b, c, D = 1.7):
result = c + (1 - c) / (1 + np.exp(-D * a * (theta - b)))
return(result)
def gpcm(theta, d, score, a, D = 1.7):
Da = D * a
result = np.exp(np.sum(Da * (theta - d[0:score])))/np.sum(np.exp(np.cumsum(Da * (theta - d))))
return(result)
d = np.array([[0, -1, .5, 1],[0,-.5,.2,1]])
a = np.array([1,1,1,1,1])
b = np.array([-1,.5,-.5,0,2])
c = np.array([0,0,0,0,0])
x = np.array([1,1,0,1,0,1,1])
indDichot = range(0,5,1)
def thetaMax(x, indDichot, a, b, c, D, d):
x1 = x[indDichot]
x2 = np.delete(x, indDichot)
result = [0] * len(x2)
def fn(theta):
if(len(x1) > 0):
p = prob3pl(theta, a, b, c, D = D)
logDichPart = sum(np.log(binom.pmf(x1,1,p)))
else:
logPolyPart = 0
if(len(x2) > 0):
for i in range(0,len(x2)):
result[i] = (gpcm(theta, d = d[i], score = x2[i], a = 1, D = D))
logPolyPart = sum(np.log(result))
else:
logPolyPart = 0
LL = -(logDichPart + logPolyPart)
return(LL)
out = minimize(fn, x0=0)
return(out)
thetaMax(x,indDichot,a,b,c,D=1,d = d)