3

I am programming an expectation-maximization algorithm with R. In order to speed-up the computation, I would like to vectorize this bottleneck. I know that N is about a hundred times k.

MyLoglik = 0
for (i in c(1:N))
{
 for (j in c(1:k))
 {
  MyLoglik = MyLoglik + MyTau[i,j]*log(MyP[j]*MyF(MyD[i,], MyMu[j,], MyS[[j]]))
 }
}

There is also this list of matrices:

MyDf.list <- vector("list", k)
for(i in 1:k)
{
 MyDf.list[[i]] <- matrix(0,d,d)
 for (j in c(1:N))
 {
  MyDf.list[[i]] = MyDf.list[[i]] + MyTau[j,i]*as.numeric((MyD[j,]-MyMu[i,])) %*% t(as.numeric(MyD[j,]-MyMu[i,]))  
 }
 MyDf.list[[i]] = MyDf.list[[i]] / MyM[i]
}

I have sped things up a bit using:

MyLoglik = 0
for (j in c(1:k))
{
 MyR= apply(MyD, 1, function(x) log(MyP[j]*MyF(x, MyMu[j,], MyS[[j]])))
 MyLoglik = MyLoglik + sum(MyTau[,j]*MyR)
}

and:

d = dim(MyD)[2]
MyDf.list <- vector("list", k)
for(i in 1:k)
{
 MyDf.list[[i]] <- matrix(0,d,d)
 MyR= apply(MyD, 1, function(x) as.numeric((x-MyMu[i,])) %*% t(as.numeric(x-MyMu[i,])))
 MyDf.list[[i]] = matrix(rowSums(t(MyTau[,i]*t(MyR))) / MyM[i],d,d)
}
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Wok
  • 4,956
  • 7
  • 42
  • 64

3 Answers3

4

For the first one, I'm assuming MyF is a function you've made? If you can make sure it will take your matrices and lists as inputs and output a matrix, you could do something like:

MyLoglik = sum(MyTau%*%log(MyP)) + sum(MyTau*log(MyF(MyD, MyMu, MyS)))

For the second one, I think because you're doing it as list it will be more difficult to vectorize. Maybe instead of a list of matrices you could have a 3-dimensional array? So that MyDf.array[i,j,k] has dimensions N, d, d (or d, d, N).

Wine
  • 246
  • 1
  • 3
  • Nice suggestion for the first one! About the second one, I thought list was the only way to get the arrays like in Matlab. – Wok Nov 22 '10 at 19:09
  • 3
    Check ?array - it can handle multiple dimensions. – Wine Nov 22 '10 at 19:18
3

I hate to even suggest this prematurely, but this is the sort of thing where building a C-extension in R might make sense. For matrices with defined (known) size (which you have here!), C-extensions aren't that hard to build, I promise! The nastiest bit here would probably be passing in 'myF'

My R-knowledge is quite out of date, but for loops (especially like this one!) used to be brutal.

Maybe timing and figuring out which part is slow would help? Is it myF? What if you change it to an identity?

Gregg Lind
  • 20,690
  • 15
  • 67
  • 81
  • Thanks for the advice. `myF` is a call to the density function (abbreviated pdf on Wikipedia) of a [Multivariate normal distribution](http://en.wikipedia.org/wiki/Multivariate_normal_distribution). This is a one-liner which is fast. This is really the loop over N which takes most of the time. N is 500 while k is 4. – Wok Nov 22 '10 at 19:01
  • I would also suggest pastebinning the whole example and submit to the R-list. I don't do much R anymore, but E-M is a well-understood area! (esp. with a simple e-m function like mvnorm!) I am guessing this is Solved (tm). – Gregg Lind Nov 22 '10 at 19:05
2

You can cut down on the work done in the inner loop if things are symmetric: A[i,j] = A[j,i]

duffymo
  • 305,152
  • 44
  • 369
  • 561