1

I have a question regarding LDA in topicmodels in R. I created a matrix with documents as rows, terms as columns, and the number of terms in a document as respective values from a data frame. While I wanted to start LDA, I got an Error Message stating "Error in !all.equal(x$v, as.integer(x$v)) : invalid argument type" . The data contains 1675 documents of 368 terms. What can I do to make the code work?

library("tm")
library("topicmodels")
data_matrix <- data %>%
group_by(documents, terms) %>%
tally %>%
spread(terms, n, fill=0)
doctermmatrix <- as.DocumentTermMatrix(data_matrix, weightTf("data_matrix"))
lda_head <- topicmodels::LDA(doctermmatrix, 10, method="Gibbs")

Help is much appreciated!

edit

 # Toy Data 
    documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) 
    meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1) 
    meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10) 
    termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar") 
    toydata <- data.frame(documentstoy,meta1toy,meta2toy,termstoy)
Community
  • 1
  • 1
martin21
  • 11
  • 1
  • 3
  • Can you make a toy data, so I can run the r code actually? – ABIM Aug 17 '18 at 00:54
  • Hey BIM, like this? `#Toy Data documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1) meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10) termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar") toydata <- data.frame(documentstoy,meta1toy,meta2toy,termstoy)` – martin21 Aug 17 '18 at 08:49
  • Works fine with the toy example. `> lda_head A LDA_Gibbs topic model with 10 topics.` – pedram Aug 17 '18 at 12:10
  • You do realise you are supplying the LDA function with your data_matrix and not with the doctermmatrix. – phiver Aug 17 '18 at 12:14
  • Thanks for your comments! @phiver sorry, that was a mistake in the code in the question. In my original code I used "doctermmatrix" for LDA. – martin21 Aug 17 '18 at 12:59

1 Answers1

1

So I looked inside the code and apparently the lda() function only accepts integers as the input so you have to convert your categorical variables as below:

library('tm')
library('topicmodels')
documentstoy <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) 
meta1toy <- c(3,4,1,12,1,2,3,5,1,4,2,1,1,1,1,1) 
meta2toy <- c(10,0,10,1,1,0,1,1,3,3,0,0,18,1,10,10) 
toydata <- data.frame(documentstoy,meta1toy,meta2toy)
termstoy <- c("cus","cus","bill","bill","tube","tube","coa","coa","un","arc","arc","yib","yib","yib","dar","dar") 
toy_unique = unique(termstoy)
for (i in 1:length(toy_unique)){
  A = as.integer(termstoy == toy_unique[i])
  toydata[toy_unique[i]] = A
}
lda_head <- topicmodels::LDA(toydata, 10, method="Gibbs")
ABIM
  • 364
  • 3
  • 19
  • Thank you very much! But you are using the dataframe for LDA then, right? So you do not create a DocumentTermMatrix? – martin21 Aug 18 '18 at 12:25
  • Yes. Because the toy data does not have the documents or the terms column. You can update the toydata to be a representative your actual data. – ABIM Aug 19 '18 at 02:16