Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions
11
votes
1 answer

Understanding LDA / topic modelling -- too much topic overlap

I'm new to topic modelling / Latent Dirichlet Allocation and have trouble understanding how I can apply the concept to my dataset (or whether it's the correct approach). I have a small number of literary texts (novels) and would like to extract some…
zinfandel
  • 428
  • 5
  • 12
11
votes
2 answers

What is the best way to obtain the optimal number of topics for a LDA-Model using Gensim?

I am trying to obtain the optimal number of topics for an LDA-model within Gensim. One method I found is to calculate the log likelihood for each model and compare each against each other, e.g. at The input parameters for using latent Dirichlet…
Akantor
  • 151
  • 1
  • 1
  • 6
11
votes
5 answers

Visualizing an LDA model, using Python

I have a LDA model with the 10 most common topics in 10K documents. Now it's just an overview of the words with corresponding probability distribution for each topic. I was wondering if there is something available for python to visualize these…
mvh
  • 189
  • 1
  • 2
  • 20
11
votes
3 answers

How to predict the topic of a new query using a trained LDA model using gensim?

I have trained a corpus for LDA topic modelling using gensim. Going through the tutorial on the gensim website (this is not the whole code): question = 'Changelog generation from Github issues?'; temp = question.lower() for i in…
Animesh Pandey
  • 5,900
  • 13
  • 64
  • 130
10
votes
1 answer

LDA Topic Model Performance - Topic Coherence Implementation for scikit-learn

I have a question around measuring/calculating topic coherence for LDA models built in scikit-learn. Topic Coherence is a useful metric for measuring the human interpretability of a given LDA topic model. Gensim's CoherenceModel allows Topic…
10
votes
5 answers

How to access topic words only in gensim

I built LDA model using Gensim and I want to get the topic words only How can I get the words of the topics only no probabilities and no IDs.words only I tried print_topics() and show_topics() functions in gensim but I can't get clean words ! This…
Muhammed Eltabakh
  • 375
  • 1
  • 10
  • 24
9
votes
1 answer

The relationship between latent Dirichlet allocation and documents clustering

I would like to clarify the relationship between latent Dirichlet allocation (LDA) and the generic task of document clustering. The LDA analysis tends to output the topic proportions for each document. If my understanding is correct, this is not…
user785099
  • 5,323
  • 10
  • 44
  • 62
9
votes
1 answer

How to get document_topics distribution of all of the document in gensim LDA?

I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code: dictionary = Dictionary(docs) corpus = [dictionary.doc2bow(doc) for doc in docs] from gensim.models import LdaModel num_topics =…
9
votes
4 answers

pyLDAvis: Validation error on trying to visualize topics

I tried generating topics using gensim for 300000 records. On trying to visualize the topics, I get a validation error. I can print the topics after model training, but it fails on using pyLDAvis # Running and Training LDA model on the document term…
Hackerds
  • 1,195
  • 2
  • 16
  • 34
9
votes
3 answers

gensim.interfaces.TransformedCorpus - How use?

I'm relative new in the world of Latent Dirichlet Allocation. I am able to generate a LDA Model following the Wikipedia tutorial and I'm able to generate a LDA model with my own documents. My step now is try understand how can I use a previus…
Marco Oliveira
  • 167
  • 1
  • 10
9
votes
2 answers

Online learning of LDA model in Spark

Is there a way to train a LDA model in an online-learning fashion, ie. loading a previously train model, and update it with new documents ?
9
votes
1 answer

Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation

I am now going through LDA(Latent Dirichlet Allocation) Topic modelling method to help in extraction of topics from a set of documents. As from what I have understood from the link below, this is an unsupervised learning approach to categorize /…
Bala
  • 193
  • 1
  • 9
9
votes
1 answer

gensim LdaMulticore not multiprocessing?

When I run gensim's LdaMulticore model on a machine with 12 cores, using: lda = LdaMulticore(corpus, num_topics=64, workers=10) I get a logging message that says using serial LDA version on this node A few lines later, I see another loging…
Edward Newell
  • 17,203
  • 7
  • 34
  • 36
9
votes
1 answer

Use scikit-learn TfIdf with gensim LDA

I've used various versions of TFIDF in scikit learn to model some text data. vectorizer = TfidfVectorizer(min_df=1,stop_words='english') The resulting data X is in this format: ' with…
ADJ
  • 4,892
  • 10
  • 50
  • 83
9
votes
4 answers

Gensim: How to save LDA model's produced topics to a readable format (csv,txt,etc)?

last parts of the code: lda = LdaModel(corpus=corpus,id2word=dictionary, num_topics=2) print lda bash output: INFO : adding document #0 to Dictionary(0 unique tokens) INFO : built Dictionary(18 unique tokens) from 5 documents (total 20 corpus…
jeremy.ting
  • 155
  • 1
  • 1
  • 7
1 2
3
78 79