I have a number of operations on data frames which I would like to speed up using mclapply()
or other lapply()
like functions. One of the easiest ways for me to wrestle with this is to make each row of the data frame a small data frame in a list. I can do this pretty easily with plyr
like this:
df <- data.frame( a=rnorm(1e4), b=rnorm(1e4))
require(plyr)
system.time(myList <- alply( df, 1, function(x) data.frame(x) ))
Once I have my data as a list I can easily do things like:
mclapply( myList, function(x) doSomething(x$a) )
This works swimmingly, but I have quite a lot of data and the adply()
step is quite slow. I tried using the multicore parallel backend on the adply
step, but it never used more than one processor even though I had registered 8. I'm suspicious the parallel option may not work with this type of problem.
Any tips on how to make this faster? Maybe a base R solution?