Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

  • Latent Dirichlet Allocation (LDA)
  • Hierarchical Dirichlet process (HDP)

Software / Libraries

Related Tags :

980 questions
15
votes
1 answer

How to interpret LDA components (using sklearn)?

I used Latent Dirichlet Allocation (sklearn implementation) to analyse about 500 scientific article-abstracts and I got topics containing most important words (in german language). My problem is to interpret these values associated with the most…
LSz
  • 161
  • 1
  • 6
14
votes
3 answers

Evaluation of topic modeling: How to understand a coherence value / c_v of 0.4, is it good or bad?

I need to know whether coherence score of 0.4 is good or bad? I use LDA as topic modelling algorithm. What is the average coherence score in this context?
User Mohamed
  • 169
  • 1
  • 1
  • 4
14
votes
1 answer

R Supervised Latent Dirichlet Allocation Package

I'm using this LDA package for R. Specifically I am trying to do supervised latent dirichlet allocation (slda). In the linked package, there's an slda.em function. However what confuses me is that it asks for alpha, eta and variance parameters. As…
Alex R.
  • 1,397
  • 3
  • 18
  • 33
14
votes
1 answer

Spark MLlib LDA, how to infer the topics distribution of a new unseen document?

i am interested in applying LDA topic modelling using Spark MLlib. I have checked the code and the explanations in here but I couldn't find how to use the model then to find the topic distribution in a new unseen document.
Rami
  • 8,044
  • 18
  • 66
  • 108
12
votes
2 answers

__init__() got an unexpected keyword argument 'cachedir' when importing top2vec

I keep getting this error when importing top2vec. TypeError Traceback (most recent call last) Cell In [1], line 1 ----> 1 from top2vec import Top2Vec File…
12
votes
2 answers

Gensim LDA topic assignment

I am hoping to assign each document to one topic using LDA. Now I realise that what you get is a distribution over topics from LDA. However as you see from the last line below I assign it to the most probable topic. My question is this. I have to…
sachinruk
  • 9,571
  • 12
  • 55
  • 86
11
votes
1 answer

Understanding LDA / topic modelling -- too much topic overlap

I'm new to topic modelling / Latent Dirichlet Allocation and have trouble understanding how I can apply the concept to my dataset (or whether it's the correct approach). I have a small number of literary texts (novels) and would like to extract some…
zinfandel
  • 428
  • 5
  • 12
11
votes
2 answers

What is the best way to obtain the optimal number of topics for a LDA-Model using Gensim?

I am trying to obtain the optimal number of topics for an LDA-model within Gensim. One method I found is to calculate the log likelihood for each model and compare each against each other, e.g. at The input parameters for using latent Dirichlet…
Akantor
  • 151
  • 1
  • 1
  • 6
11
votes
5 answers

Visualizing an LDA model, using Python

I have a LDA model with the 10 most common topics in 10K documents. Now it's just an overview of the words with corresponding probability distribution for each topic. I was wondering if there is something available for python to visualize these…
mvh
  • 189
  • 1
  • 2
  • 20
11
votes
2 answers

Making gsub only replace entire words?

(I'm using R.) For a list of words that's called "goodwords.corpus", I am looping through the documents in a corpus, and replacing each of the words on the list "goodwords.corpus" with the word + a number. So for example if the word "good" is on the…
user2303557
  • 225
  • 1
  • 6
  • 15
11
votes
3 answers

How to predict the topic of a new query using a trained LDA model using gensim?

I have trained a corpus for LDA topic modelling using gensim. Going through the tutorial on the gensim website (this is not the whole code): question = 'Changelog generation from Github issues?'; temp = question.lower() for i in…
Animesh Pandey
  • 5,900
  • 13
  • 64
  • 130
10
votes
3 answers

How to understand the output of Topic Model class in Mallet?

As I'm trying out the examples code on topic modeling developer's guide, I really want to understand the meaning of the output of that code. First during the running process, it gives out: Coded LDA: 10 topics, 4 topic bits, 1111 topic mask max…
Matt
  • 741
  • 1
  • 6
  • 17
10
votes
1 answer

LDA Topic Model Performance - Topic Coherence Implementation for scikit-learn

I have a question around measuring/calculating topic coherence for LDA models built in scikit-learn. Topic Coherence is a useful metric for measuring the human interpretability of a given LDA topic model. Gensim's CoherenceModel allows Topic…
10
votes
5 answers

How to access topic words only in gensim

I built LDA model using Gensim and I want to get the topic words only How can I get the words of the topics only no probabilities and no IDs.words only I tried print_topics() and show_topics() functions in gensim but I can't get clean words ! This…
Muhammed Eltabakh
  • 375
  • 1
  • 10
  • 24
10
votes
2 answers

What is the relation between topic modeling and document clustering?

Topic modeling identifies distribution of topics in a document collection, which effectively identifies the clusters in the collection. So is it right to say that topic modeling is a technique to do document clustering?
afs
  • 167
  • 1
  • 9
1
2
3
65 66