Questions tagged [topicmodels]

topicmodels is an R package implementing Latent Dirichlet Allocation topic modeling.

Excerpt from topicmodels page on CRAN:

Provides an interface to the C code for Latent Dirichlet Allocation (LDA) models and Correlated Topics Models (CTM) by David M. Blei and co-authors and the C++ code for fitting LDA models using Gibbs sampling by Xuan-Hieu Phan and co-authors.

101 questions
0
votes
1 answer

Is it possible to use topic modeling for a single document

Is it rational to use topic modelling for a single document or to be more precise is it mathematically okay to use LDA-gibbs method for a single document.If so what should be value of k and seed. Also what is be the role of k and seed for single as…
rishav
  • 3
  • 1
  • 4
0
votes
0 answers

Vector Size Specified is too large in R

I am trying to fetch the tweets for one of the keyword let's say"zomato" and trying to do topic modelling on the tweets fetched. Following is the search function to fetch tweets. search <- function(searchterm) { #access tweets and create…
0
votes
1 answer

How to handle bigrams of same word in different sequence in topics modeling in python? Ex. 'lease extension' and 'extension lease'

Hello Stackoverflow Community, I am reaching out to you all for ideas on how to handle bigrams of the same word in a different sequence in topics modeling in python. I have a topic model where two bigrams which mean the same are treated as different…
0
votes
1 answer

Plotting topic prevelance for each group [Structural Topic Modeling R]

Community, I have a question regarding the STM package for R and hope that you can help me find an answer. In figure 7 of the vignette the authors present a graph, where the topic prevalence (for topic 7) over time can be seen. Is it possible to…
Hu_Ca
  • 47
  • 1
  • 5
0
votes
1 answer

Topic Modeling: LDA and BTM

Does anyone know here about topic modeling? I badly need help. 1) What is Topic Modeling 2) What is Latent Dirichlet Allocation and Biterm Topic Modeling? 3) What is the difference between LDA and BTM? 4) How do they work? I found studies but I…
Dan
  • 35
  • 7
0
votes
0 answers

Why estimateEffect of STM on my code doesn't work?

I got a problem while running a covariate effect on STM model in R. Any suggestions for me to solve this problem? library(quanteda) data <- read.csv("nr_11r.csv") data$documents <- as.character(data$documents) data$gender <-…
puspa
  • 1
  • 2
0
votes
2 answers

Error: No tidy method for objects of class LDA_VEM§

I am literally following the steps as presented in chapter 6 of the "Text Mining in R: a Tidy Approach" book. See: https://www.tidytextmining.com/topicmodeling.html #import libraries library(topicmodels) library(tidytext) #access…
Vasino
  • 3
  • 1
  • 2
0
votes
1 answer

Additional seedwords argument in LDA() function from topicmodels

I am looking for an in depth example of Latent Dirichlet Allocation (LDA) with seedwords specified for the topicmodels package in R. The basic function takes on the form: LDA(x, k, method = "Gibbs", control = NULL, model = NULL, ...) And the…
0
votes
2 answers

how to predict topics for a batch of documents with mallet

I am using mallet from a scala project. After training the topic models and got the inferencer file, I tried to assign topics to new texts. The problem is I got different results with different calling methods. Here are the things I tried: creating…
yang
  • 498
  • 5
  • 22
0
votes
0 answers

Combining text stored in dataframe and folders to one corpus

I have text data stored in two different formats- as a dataframe and as a series of folders (because of the storage type, I'm not sure I will be able to post this question in a reproducible format). I'm able to create a corpus from each of these…
sabrina
  • 43
  • 1
  • 1
  • 8
0
votes
0 answers

What’s next after Topic modelling in LDA

I’m new to topic modelling. So I hope someone experienced can answer my queries. Here’s a simplified format of my data: 1. I have a csv file of dimension of 1000*2. (mixture of topics) 2. Each row is a document and a document ID. each document can…
R_abcdefg
  • 145
  • 1
  • 11
0
votes
3 answers

quanteda convert to topicmodels retaining docvars

I'm using the awesome quanteda package to convert my dfm to a topicmodels format. However, in the process I'm losing my docvars which I need for identifying which topics are most likely prevalent in my documents. This is especially a problem given…
0
votes
0 answers

Defining own stopwords by their beginning

I'm looking for a code, which allows me to delete own stopwords from my textcorpus, but only with defining them by their beginning example: In my corpus that contains newspaper articles, there are also additional htpps.... internet links included,…
0
votes
1 answer

Why do we need the hyperparameters beta and alpha in LDA?

I'm trying to understand the technical part of Latent Dirichlet Allocation (LDA), but I have a few questions on my mind: First: Why do we need to add alpha and gamma every time we sample the equation below? What if we delete the alpha and gamma from…
Mr. Almars
  • 11
  • 4
0
votes
0 answers

Odd symbols in R script lost after reloading

I am implementing an LDA topic model using tm and topicmodels packages. Some of the documents contain odd characters that are not removed automatically (e.g. docs <- tm_map(docs, removePunctuation does not remove ’. When I read the .txt files into…
Michael
  • 159
  • 1
  • 2
  • 14