0

I am doing an EM estimation, in the M step of which I need Max Likelihood Estimation, which has 24 parameters. I tried the nlm/optim/maxLik functions in R. They are all very slow. Any suggestions are welcome. Thanks. This is the LogL function: (choices, M, S, K, N and Alpha are known.)

logl <- function(theta,choices,M,S,K,N,Alpha){

betas <- theta[(1:(S*(K+1)))]
betas<-matrix(betas,S,K+1,byrow=TRUE)

loglik <-for (n in 1:N){
pr1s=foreach (s=1:S) %dopar%{ 
pr11=foreach (i = 1:K) %dopar%{ 
exp(sum(betas[s,]*choices[[n]][i,]))/exp(sum(M[[i]]%*%betas[s,]))}
pr11=as.numeric(pr11)
prod(pr11) 
}
pr1sn=as.numeric(pr1s) 

l[n]= sum(Alpha*pr1sn)
}
L=-sum(log((l)))
return(L)}

What I want to get is :

ops=nlm(logl,theta.start,choices=choices,M=M,S=2,K=11,N=3,Alpha=Alpha,hessian=TRUE)
  • 8
    What is "very slow"? Seconds? Minutes? Hours? Days? Have you tried without paralellization? It might not be worth the overhead if your number of iterations is small. I somewhat doubt that nested parallel loops actually work as expected, but have never tried this. Profile your function to find out, which parts are slow, and try to vectorize. – Roland Dec 12 '12 at 09:21
  • 3
    You should give more details: describe the model, write your LogL function in math notation, provide a toy data set on which we could make some tries. – Elvis Dec 12 '12 at 09:22
  • 1
    The first thing I would change (only because its simple) is to avoid the use of $\exp(x)$ until you need it. I think you can at least remove the inner loop over K. First you have $\frac{\exp(x)}{\exp(y)}$ which is analytically equal to $\exp(x-y)$. Then you also have a product of exponentials, which is the exponential of the sum of the logs. $\exp(x_1-y_1)\exp(x_2-y_2)...\exp(x_K-y_K)=\exp(x_1+...+x_K-y_1-...-y_K)$. While these make no difference to the correct answer, it can make a big difference to your computing time. – probabilityislogic Dec 12 '12 at 14:54

1 Answers1

9

I advice you to make your code more tidy. Be consistent. It will be easier for you to read your code and to improve it.

As I understand you have three loops. All of them can be done in parallel, isn't it? Why are you doing one loop with for and other two with foreach? Is there a reason for it?

What does this assignment loglik <- for (n in 1:N) is for?

There is .combine argument in foreach that could be used here. There is an %:% operator for nested loops.

Tried to improve the code. However not sure if I have understood it correctly. And not sure if it is faster then yours. Reproducible example is necessary to give more precise answer with timing.

logl <- function(theta, choices, M, S, K, N, Alpha) {
  betas <- theta[(1:(S*(K+1)))]
  betas <- matrix(betas, S, K+1, byrow=TRUE)

  l <- foreach(n = 1:N, .combine = c) %:%
    foreach(s = 1:S, .combine = sum) %:%
      foreach(i = 1:K, .combine = prod) %dopar% {
        exp(sum(betas[s,] * choices[[n]][i,])) / exp(sum(M[[i]] %*% betas[s,]))
      }

  return(-sum(log(Alpha * l)))
}
djhurio
  • 5,437
  • 4
  • 27
  • 48