I have a model for four possibilities of purchasing a pair items (purchasing both, none or just one) and need to optimize the (pseudo-) log-likelihood function. Part of this, of course, is the calculation/definition of the pseudo-log-likelihood function.
The following is my code, where Beta is a 2-d vector for each customer (there are U customers and U different beta vectors), X is a 2-d vector for each item (different for each of the N items) and Gamma is a symmetric matrix with a scalar value gamma(i,j) for each pair of items. And df is a dataframe of the purchases - one row for each customer and N columns for the items.
It would seem to me that all of these loops are inefficient and take up too much time, but I am not sure how to speed up this calculation and would appreciate any help improving it. Thank you in advance!
def pseudo_likelihood(Args):
Beta = np.reshape(Args[0:2*U], (U, 2))
Gamma = np.reshape(Args[2*U:], (N,N))
L = 0
for u in range(0,U,1):
print datetime.datetime.today(), " for user {}".format(u)
y = df.loc[u][1:]
beta_u = Beta[u,:]
for l in range(N):
print datetime.datetime.today(), " for item {}".format(l)
for i in range(N-1):
if i == l:
continue
for j in range(i+1,N):
if (y[i] == y[j]):
if (y[i] == 1):
L += np.dot(beta_u,(x_vals.iloc[i,1:]+x_vals.iloc[j,1:])) + Gamma[i,j] #Log of the exponent of this expression
else:
L += np.log(
1 - np.exp(np.dot(beta_u, (x_vals.iloc[i, 1:] + x_vals.iloc[j, 1:])) + Gamma[i, j])
- np.exp(np.dot(beta_u, x_vals.iloc[i, 1:])) * (
1 - np.exp(np.dot(beta_u, x_vals.iloc[j, 1:])))
- np.exp(np.dot(beta_u, x_vals.iloc[j, 1:])) * (
1 - np.exp(np.dot(beta_u, x_vals.iloc[i, 1:]))))
else:
if (y[i] == 1):
L += np.dot(beta_u,x_vals.iloc[i,1:]) + np.log(1 - np.exp(np.dot(beta_u,x_vals.iloc[j,1:])))
else:
L += (np.dot(beta_u, x_vals.iloc[j,1:])) + np.log(1 - np.exp(np.dot(beta_u, x_vals.iloc[i,1:])))
L -= (N-2)*np.dot(beta_u,x_vals.iloc[l,1:])
for k in range(N):
if k != l:
L -= np.dot(beta_u, x_vals.iloc[k,1:])
return -L
To add/clarify - I am using this calculation to optimize and find the beta and gamma parameters that generated the data for this pseudo-likelihood function.
I am using scipy optimize.minimize with the 'Powell' method.