Questions tagged [topicmodels]

topicmodels is an R package implementing Latent Dirichlet Allocation topic modeling.

Excerpt from topicmodels page on CRAN:

Provides an interface to the C code for Latent Dirichlet Allocation (LDA) models and Correlated Topics Models (CTM) by David M. Blei and co-authors and the C++ code for fitting LDA models using Gibbs sampling by Xuan-Hieu Phan and co-authors.

101 questions
2
votes
1 answer

Quanteda with topicmodels: removed stopwords appear in results (Chinese)

My code: library(quanteda) library(topicmodels) # Some raw text as a vector postText <- c("普京 称 俄罗斯 未 乌克兰 施压 来自 头 条 新闻", "长期 电脑 前进 食 致癌 环球网 报道 乌克兰 学者 认为 电脑 前进 食 会 引发 癌症 等 病症 电磁 辐射 作用 电脑 旁 水 食物 会 逐渐 变质 有害 物质 累积 尽管 人体 短期 内 会 感到 适 会 渐渐 引发 出 癌症 阿尔茨海默 式…
Jackson-MSFT
  • 65
  • 1
  • 5
2
votes
0 answers

Classifying new text using mallet package

Does anybody know if there is a way to classify new text data into topics using R package mallet? The general routine for this package is: mallet.instances <- mallet.import(as.character(data$id), …
IVR
  • 1,718
  • 2
  • 23
  • 41
2
votes
2 answers

DocumentTermMatrix needs to have a term frequency weighting Error

I'm trying to use LDA() from topicmodels package on a quite large data set. After trying everything to fix the following errors "In nr * nc : NAs produced by integer overflow" and "Each row of the input matrix needs to contain at least one non-zero…
user1569341
  • 333
  • 1
  • 6
  • 17
2
votes
1 answer

In R topicmodels package, how could we get the topics' distributions over terms?

I'm running LDA by using topicmodels package. lda.model = LDA(dtm, k,control = list(em = list(iter.max = 1000, tol = 10^-4))) apps.terms<-terms(lda.model,15) head(apps.terms) Topic.1 Topic.2 Topic.3 Topic.4 Topic.5 1 38 55 187 …
ysfseu
  • 666
  • 1
  • 10
  • 20
2
votes
1 answer

Graph a single LDA topic by date (in R)

I have a group of text files from several journals (let's call them journal A and journal B) that I am trying to run LDA on. I divide them each into their own corpus, then attach the names of the files to each corpus, store the journal of origin…
mlinegar
  • 1,389
  • 1
  • 11
  • 19
1
vote
2 answers

get_coherence : C_V method gets an error but U_Mass works

I'm using the following code to check the coherence value. The problem is code below works well when I change the coherence type into "u_mass", but if I want to compute "c_v", an Index error occure. Previous text process: # Remove Stopwords, Form…
Victoria L
  • 45
  • 5
1
vote
2 answers

R STM Topic Proportion table

I'm trying to make a table for my STM model just like this. I am new to R programming language and STM. I have been searching in the documentation about this and do not know if there is a function that makes just like this format or if I have to…
JS J
  • 33
  • 4
1
vote
0 answers

How can I make a model in R that uses predefined topics with certain words on a new set of words to determine the relatedness to the topics

I'm trying to build a model that can determine how related a string of text is to a predefined topic and have tried several methods (LDA with seedwords, Naive Bayes mainly) but can't really get the desired results. I have a list with two topics…
1
vote
0 answers

Output pyLDAvis topic keywords with chosen lambda

I used PyLDAvis visualize LDiA model outputs and by playing around lambda I found key word chosen for each topic can be very interpretable under certain lambda value than others. I wonder if there is a easy way to output key words under certain…
MeiNan Zhu
  • 1,021
  • 1
  • 9
  • 18
1
vote
2 answers

How to keep the text id of removed text in lda

I have a dataframe like this dtext <- data.frame(id = c(1,2,3,4), text = c("here","This dataset contains movie reviews along with their associated binary sentiment polarity labels. It is intended to serve as a benchmark for sentiment classification.…
Nathalie
  • 1,228
  • 7
  • 20
1
vote
1 answer

How to create a grid search to find best parameters?

In lda analysis library(topicmodels) # parameters for Gibbs sampling burnin <- 4000 iter <- 2000 thin <- 500 seed <-list(1969,5,25,102855,2012) nstart <- 5 best <- TRUE #Number of topics k <-…
Nathalie
  • 1,228
  • 7
  • 20
1
vote
1 answer

Confusion matrix for LDA

I’m trying to check the performance of my LDA model using a confusion matrix but I have no clue what to do. I’m hoping someone can maybe just point my in the right direction. So I ran an LDA model on a corpus filled with short documents. I then…
Susan-l3p
  • 157
  • 1
  • 13
1
vote
0 answers

In R, How to write data function in Topic Model?

Today I learn Topic Model in R. The very first question is how to load the dataset below. I find there are some pre-built in datasets in R. But How can I save a newly built dataset in R so that I can use it as i use others like crude,acq.... How to…
Dylan
  • 1,183
  • 4
  • 13
  • 26
1
vote
1 answer

Restore original document id from lda object

I'm trying to compare the "consensus" topic prediction (beta) from terms (in a given document) against the most likely predicted topic from the document itself (gamma) using functions from topicmodels. While it's easy to extract the most likely…
Chris T.
  • 1,699
  • 7
  • 23
  • 45
1
vote
0 answers

clarification on supervised LDA package in R

I'm using LDA package in R for topic modeling and get confused on the parameter setting in the function slda.em. slda.em(documents, K, vocab, num.e.iterations, num.m.iterations, alpha, eta, annotations, params, variance, logistic = FALSE, lambda =…
Qiuyi Wu
  • 31
  • 5