2

Does anybody know if there is a way to classify new text data into topics using R package mallet?

The general routine for this package is:

mallet.instances <- mallet.import(as.character(data$id), 
                                 as.character(data$text), 
                                 "Documents/Projects/tm/stopwords.txt")

topic.model <- MalletLDA(num.topics=10)
topic.model$loadDocuments(mallet.instances)
topic.model$setAlphaOptimization(20, 100) # optimise parameters after every 20 iterations which will be preceeded by 100 burnin
topic.model$train(1000) # train the model
topic.model$maximize(10) # pick the best topic for each token

But I could not find anywhere a way to classify new data using a pre-trained model. The alternatives would be to either use the topicmodels package or to run Mallet through the command line. Both options are reasonable (although I must say that I tend to get much more convincing results using Mallet that topicmodels), however if I have already trained a model using the R package mallet and I do not want to change the topics, finding a way to classify data using mallet package would be quite helpful.

IVR
  • 1,718
  • 2
  • 23
  • 41

0 Answers0