I have been trying to do topic modeling on a collection of discussion forum posts in a MOOC. I have tried basic LDA to create topics, and the topics were meaningless. So now I'm looking into seeding my topics to create better topics. I found the seededlda package, which requires a dfm as an input as well as a dictionary of seeded terms. It works well! My issue is figuring out how each document, or forum post, is categorized.
My original data has "userid" as a variable and "post" as the document I'm using for LDA. So far my code looks like this.
text <- introduction_posts$post
dfmt <- dfm(text, remove_number = TRUE) %>%
dfm_remove(stopwords('en'), min_nchar = 2)
#install.packages("seededlda")
library(seededlda)
slda <- textmodel_seededlda(dfmt,
seeded_dict,
valuetype = c("glob", "regex", "fixed"),
case_insensitive = FALSE,
residual = TRUE,
weight = 0.01,
max_iter = 2000,
alpha = NULL,
beta = NULL,
verbose = quanteda_options("verbose")
)
terms <- terms(slda)
How can I determine which terms go to which user?
When I used the LDA function under the topicmodeling package I used a document term matrix defined this way
posts_dtm <- CreateDtm(doc_vec = introduction_posts$post, # character vector of documents
doc_names = introduction_posts$userid_bycourse, # document names
ngram_window = c(1, 2), # minimum and maximum n-gram length
stopword_vec = c(stopwords::stopwords("en"), # stopwords from tm
stopwords::stopwords(source = "smart"))
which named the documents as it went along. In the end I was able to nicely see which topics went to which participants. But I can't seem to do that with the dfm that the seededlda package uses.
Any help would be appreciated.