Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet process (HDP)

Software / Libraries

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans

Related Tags :

topicmodels

980 questions

votes

2 answers

What do the parameters of the csvIterator mean in Mallet?

I am using mallet topic modelling sample code and though it runs fine, I would like to know what the parameters of this statement actually mean? instances.addThruPipe(new CsvIterator(new FileReader(dataFile), …

machine-learning nlp topic-modeling text-analysis mallet

asked Jan 13 '15 at 17:04

London guy

27,522
44
121
179

votes

3 answers

Topic Modeling tool for large data set (30GB)

I'm looking for some topic modeling tool which can be applicable to a large data set. My current data set for training is 30 GB. I tried MALLET topic modeling, but always I got OutOfMemoryError. If you have any tips, please let me know.

lda topic-modeling

asked Jul 14 '14 at 10:18

Benben

1,355
5
18
31

votes

1 answer

Incremental training of Topic Models in MALLET

According to the MALLET documentation, it's possible to train topic models incrementally: "-output-model [FILENAME] This option specifies a file to write a serialized MALLET topic trainer object. This type of output is appropriate for pausing…

topic-modeling mallet

asked Apr 04 '14 at 21:23

vpekar

3,275
1
19
16

votes

2 answers

Run cvb in mahout 0.8

The current Mahout 0.8-SNAPSHOT includes a Collapsed Variational Bayes (cvb) version for Topic Modeling and removed the Latent Dirichlet Analysis (lda) approach, because cvb can be parallelized way better. Unfortunately there is only documentation…

mahout lda topic-modeling

asked Feb 07 '13 at 17:24

JoKnopp

votes

1 answer

Implementing Topic Model with Python (numpy)

Recently, I implemented Gibbs sampling for LDA topic model on Python using numpy, taking as a reference some code from a site. In each iteration of Gibbs sampling, we remove one (current) word, sample a new topic for that word according to a…

python numpy machine-learning lda topic-modeling

asked May 09 '12 at 15:57

D T

votes

2 answers

Removing an "empty" character item from a corpus of documents in R?

I am using the tm and lda packages in R to topic model a corpus of news articles. However, I am getting a "non-character" problem represented as "" that is messing up my topics. Here is my workflow: text <- Corpus(VectorSource(d$text)) newtext <-…

r text-mining text-analysis lda topic-modeling

asked May 07 '12 at 20:02

user836015

votes

1 answer

Topic Modeling: How do I use my fitted LDA model to predict new topics for a new dataset in R?

I am using 'lda' package in R for topic modeling. I want to predict new topics(collection of related words in a document) using a fitted Latent Dirichlet Allocation(LDA) model for new dataset. In the process, I came across predictive.distribution()…

r lda topic-modeling

asked May 07 '12 at 13:53

ankit sethi

votes

0 answers

Gensim HDP - Top Topics' distribution for document

I want topic distribution for my documents. However, Gensim's HDP's show_topic() returns 20 topics by default. And I suppose they are not supposed to be the best. After digging deeper, I found out there are total 150 topics, as the truncation level…

python nlp gensim lda topic-modeling

asked Mar 23 '21 at 10:39

Shirish Bajpai

votes

1 answer

(gensim) LdaMallet vs LdaModel?

What is the difference between using gensim.models.LdaMallet and gensim.models.LdaModel? I noticed that the parameters are not all the same and would like to know when one should be used over the other?

gensim lda topic-modeling mallet

asked Jun 25 '20 at 18:19

Desi Pilla

votes

3 answers

A practical example of GSDMM in python?

I want to use GSDMM to assign topics to some tweets in my data set. The only examples I found (1 and 2) are not detailed enough. I was wondering if you know of a source (or care enough to make a small example) that shows how GSDMM is implemented…

python lda topic-modeling tweets

asked May 30 '20 at 21:17

Pie-ton

votes

4 answers

Coherence score (u_mass) -18 is good or bad?

I read this question (Coherence score 0.4 is good or bad?) and found that the coherence score (u_mass) is from -14 to 14. But when I did my experiments, I got a score of -18 for u_mass and 0.67 for c_v. I wonder how is my u_mass score out of range…

nlp lda topic-modeling lsa topicmodels

asked May 26 '20 at 22:22

Dammio

votes

1 answer

After applying gensim LDA topic modeling, how to get documents with highest probability for each topic and save them in a csv file?

I have used gensim LDA Topic Modeling to get associated topics from a corpus. Now I want to get the top 20 documents representing each topic: documents that have the highest probability in a topic. And I want to save them in a CSV file with this…

python csv gensim lda topic-modeling

asked Jun 01 '19 at 17:16

Aria

votes

1 answer

pyspark LDA get words in topics

I am trying to run LDA. I am not applying it to words and documents, but error messages and error-cause. each row is an error and each column is an error cause. A cell is 1 if error cause was active, and 0 if error cause was not active. Now I am…

apache-spark pyspark lda topic-modeling

asked Nov 26 '18 at 10:15

LN_P

1,448
4
21
37

votes

1 answer

pyLDAvis | Could I get "Top-30 Most Relevant Terms for Topic"?

During the Topicmodeling visualization through LDAvis, I found that Slide to adjust relevance metric varies depending on the topic and lambda values. Is there a way to get this word list? I want to get the representative words that vary depending on…

python-3.x nlp lda topic-modeling

asked Aug 27 '18 at 07:42

seungheondoh

votes

1 answer

sklearn LatentDirichletAllocation topic inference on new corpus

I have been using the sklearn.decomposition.LatentDirichletAllocation module to explore a corpus of documents. After a number of iterations of training and adjusting the model (i.e. adding stopwords and synonyms, varying the number of topics), I am…

python scikit-learn lda topic-modeling

asked Aug 02 '18 at 14:03

J. Veenkamp

Prev 1 2 3

…

65 66 Next