Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet process (HDP)

Software / Libraries

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans

Related Tags :

topicmodels

980 questions

votes

1 answer

How to interpret LDA components (using sklearn)?

I used Latent Dirichlet Allocation (sklearn implementation) to analyse about 500 scientific article-abstracts and I got topics containing most important words (in german language). My problem is to interpret these values associated with the most…

asked Feb 01 '16 at 20:53

LSz

votes

3 answers

Evaluation of topic modeling: How to understand a coherence value / c_v of 0.4, is it good or bad?

I need to know whether coherence score of 0.4 is good or bad? I use LDA as topic modelling algorithm. What is the average coherence score in this context?

data-science lda topic-modeling

asked Feb 19 '19 at 09:23

User Mohamed

votes

1 answer

R Supervised Latent Dirichlet Allocation Package

I'm using this LDA package for R. Specifically I am trying to do supervised latent dirichlet allocation (slda). In the linked package, there's an slda.em function. However what confuses me is that it asks for alpha, eta and variance parameters. As…

r topic-modeling dirichlet latent-semantic-analysis

asked Apr 27 '16 at 23:40

Alex R.

1,397
3
18
33

votes

1 answer

Spark MLlib LDA, how to infer the topics distribution of a new unseen document?

i am interested in applying LDA topic modelling using Spark MLlib. I have checked the code and the explanations in here but I couldn't find how to use the model then to find the topic distribution in a new unseen document.

apache-spark lda apache-spark-mllib topic-modeling

asked Sep 16 '15 at 09:22

Rami

8,044
18
66
108

votes

2 answers

init() got an unexpected keyword argument 'cachedir' when importing top2vec

I keep getting this error when importing top2vec. TypeError Traceback (most recent call last) Cell In [1], line 1 ----> 1 from top2vec import Top2Vec File…

python machine-learning topic-modeling

asked Sep 23 '22 at 15:50

Redwan Hossain Arnob

votes

2 answers

Gensim LDA topic assignment

I am hoping to assign each document to one topic using LDA. Now I realise that what you get is a distribution over topics from LDA. However as you see from the last line below I assign it to the most probable topic. My question is this. I have to…

gensim lda topic-modeling

asked Oct 11 '16 at 03:07

sachinruk

9,571
12
55
86

votes

1 answer

Understanding LDA / topic modelling -- too much topic overlap

I'm new to topic modelling / Latent Dirichlet Allocation and have trouble understanding how I can apply the concept to my dataset (or whether it's the correct approach). I have a small number of literary texts (novels) and would like to extract some…

python nlp gensim lda topic-modeling

asked Sep 20 '17 at 15:30

zinfandel

votes

2 answers

What is the best way to obtain the optimal number of topics for a LDA-Model using Gensim?

I am trying to obtain the optimal number of topics for an LDA-model within Gensim. One method I found is to calculate the log likelihood for each model and compare each against each other, e.g. at The input parameters for using latent Dirichlet…

python text-mining lda gensim topic-modeling

asked Aug 31 '15 at 13:58

Akantor

votes

5 answers

Visualizing an LDA model, using Python

I have a LDA model with the 10 most common topics in 10K documents. Now it's just an overview of the words with corresponding probability distribution for each topic. I was wondering if there is something available for python to visualize these…

python data-visualization lda topic-modeling

asked May 22 '15 at 13:07

mvh

votes

2 answers

Making gsub only replace entire words?

(I'm using R.) For a list of words that's called "goodwords.corpus", I am looping through the documents in a corpus, and replacing each of the words on the list "goodwords.corpus" with the word + a number. So for example if the word "good" is on the…

r gsub topic-modeling

asked Apr 06 '14 at 00:37

user2303557

votes

3 answers

How to predict the topic of a new query using a trained LDA model using gensim?

I have trained a corpus for LDA topic modelling using gensim. Going through the tutorial on the gensim website (this is not the whole code): question = 'Changelog generation from Github issues?'; temp = question.lower() for i in…

python nlp lda topic-modeling gensim

asked Apr 28 '13 at 10:39

Animesh Pandey

5,900
13
64
130

votes

3 answers

How to understand the output of Topic Model class in Mallet?

As I'm trying out the examples code on topic modeling developer's guide, I really want to understand the meaning of the output of that code. First during the running process, it gives out: Coded LDA: 10 topics, 4 topic bits, 1111 topic mask max…

machine-learning topic-modeling mallet

asked Dec 09 '11 at 15:02

Matt

votes

1 answer

LDA Topic Model Performance - Topic Coherence Implementation for scikit-learn

I have a question around measuring/calculating topic coherence for LDA models built in scikit-learn. Topic Coherence is a useful metric for measuring the human interpretability of a given LDA topic model. Gensim's CoherenceModel allows Topic…

scikit-learn nlp gensim lda topic-modeling

asked Aug 30 '18 at 18:01

learning-new-things-guy

votes

5 answers

How to access topic words only in gensim

I built LDA model using Gensim and I want to get the topic words only How can I get the words of the topics only no probabilities and no IDs.words only I tried print_topics() and show_topics() functions in gensim but I can't get clean words ! This…

python nlp gensim lda topic-modeling

asked Oct 03 '17 at 01:58

Muhammed Eltabakh

votes

2 answers

What is the relation between topic modeling and document clustering?

Topic modeling identifies distribution of topics in a document collection, which effectively identifies the clusters in the collection. So is it right to say that topic modeling is a technique to do document clustering?

cluster-analysis topic-modeling unsupervised-learning

asked Mar 19 '13 at 02:48

afs

Prev 1

…

65 66 Next