I have a series of documents (~50,000), that I've transformed into a corpus and have been building LDA objects using the topicmodels package in R. Unfortunately, in order to test more than 150 topics, it takes several hours.
So far, I've found that I can test several different clusters sizes simultaneously using:
library(topicmodels)
library(plyr)
library(foreach)
library(doMC)
registerDoMC(5) # use 5 cores
dtm # my documenttermmatrix
seq <- seq(200,500, by=50)
models <- llply(seq, function(d){LDA(dtm, d)}, .parallel=T)
Is there not a way to parallelize the LDA function so that it runs faster (rather than running multiple LDAs at once)?