Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions

votes

1 answer

How do you initialize a gensim corpus variable with a csr_matrix?

I have X as a csr_matrix that I obtained using scikit's tfidf vectorizer, and y which is an array My plan is to create features using LDA, however, I failed to find how to initialize a gensim's corpus variable with X as a csr_matrix. In other words,…

asked Mar 27 '13 at 22:12

IssamLaradji

6,637
8
43
68

votes

2 answers

Implementing alternative forms of LDA

I am using Latent Dirichlet Allocation with a corpus of news data from six different sources. I am interested in topic evolution, emergence, and want to compare how the sources are alike and different from each other over time. I know that there are…

python r nlp text-mining lda

asked Apr 11 '12 at 19:20

user836015

votes

0 answers

Gensim lda gives negative log-perplexity value - is it normal and how can i interpret it?

I am currently using Gensim LDA for topic modeling. While Tuning hyper-parameters I found out that the model always gives negative log-perplexity Is it normal for model to behave like this?? (is it even possible?) if it is, is smaller perplexity…

gensim lda perplexity

asked Jul 22 '20 at 02:30

nowheretogo

votes

2 answers

How do I calculate the coherence score of an sklearn LDA model?

Here, best_model_lda is an sklearn based LDA model and we are trying to find a coherence score for this model.. coherence_model_lda = CoherenceModel(model = best_lda_model,texts=data_vectorized, dictionary=dictionary,coherence='c_v') coherence_lda =…

scikit-learn gensim lda

asked Mar 10 '20 at 08:03

Arvind Sudheer

votes

1 answer

Should the "perplexity" (or "score") go up or down in the LDA implementation of Scikit-learn?

I'd like to know what does the perplexity and score means in the LDA implementation of Scikit-learn. Those functions are obscure. At the very least, I need to know if those values increase or decrease when the model is better. I've searched but it's…

python scikit-learn statistics lda log-likelihood

asked Aug 07 '18 at 20:35

Guillaume Chevalier

9,613
8
51
79

votes

1 answer

Automatic labeling of LDA generated topics

I'm trying to categorize customer feedback and I ran an LDA in python and got the following output for 10 topics: (0, u'0.559*"delivery" + 0.124*"area" + 0.018*"mile" + 0.016*"option" + 0.012*"partner" + 0.011*"traffic" + 0.011*"hub" +…

python nlp lda topic-modeling labeling

asked May 15 '17 at 17:41

Arman

votes

1 answer

LDA TopicModels producing list of numbers rather than terms

Bear with me as I am extremely new to this and working on a project for a course in a certificate program. I have .csv dataset that I obtained by retrieving bibliometric records from Pubmed and Embase databases. There are 1034 rows. There are…

r lda topicmodels

asked Apr 17 '17 at 02:19

SciLibby

votes

1 answer

Use topic modeling information from LDA as features to perform text classification through SVM

I want to perform text classification using topic modeling information as features that are fed to an svm classifier. So I was wondering how is it possible to generate topic modeling features by performing LDA on both the training and test…

python classification svm lda

asked Dec 06 '16 at 22:21

asterix

votes

1 answer

LDA interpretation

I use the HMeasure package to involve the LDA in my analysis about credit risk. I have 11000 obs and I've chosen age and income to develop the analysis. I don't know exactly how to interpret the R results of LDA. So, I don't know if I chosen the…

r lda risk-analysis linear-discriminant

asked Oct 17 '16 at 13:15

Dalila

votes

1 answer

finding number of documents per topic for LDA with scikit-learn

I'm following along with the scikit-learn LDA example here and am trying to understand how I can (if possible) surface how many documents have been labeled as having each one of these topics. I've been poring through the docs for the LDA model here…

scikit-learn lda

asked Feb 07 '16 at 11:15

user139014

1,445
2
19
33

votes

1 answer

Understanding LDA Transformed Corpus in Gensim

I tried to examine the contents of the BOW corpus vs. the LDA[BOW Corpus] (transformed by LDA model trained on that corpus with, say, 35 topics) I found the following output: DOC 1 : [(1522, 1), (2028, 1), (2082, 1), (6202, 1)] LDA 1 : [(29,…

python nlp lda gensim

asked May 07 '14 at 05:48

Ravi Karan

votes

3 answers

Are there any efficient python libraries for Dynamic Topic Models, preferably extending Gensim?

I'm trying to model twitter stream data with topic models. Gensim, being an easy to use solution, is impressive in it's simplicity. It has a truly online implementation for LSI, but not for LDA. For a changing content stream like twitter, Dynamic…

python lda text-analysis topic-modeling gensim

asked Mar 18 '14 at 02:52

Ravi Karan

votes

2 answers

Latent Dirichlet Allocation Solution Example

I am trying to learn about Latent Dirichlet Allocation (LDA). I have basic knowledge of machine learning and probability theory and based on this blog post http://goo.gl/ccPvE I was able to develop the intuition behind LDA. However I still haven't…

lda topic-modeling

asked May 16 '12 at 18:48

user737128

votes

4 answers

Convert one-document-per-line to Blei's lda-c/dtm format for topic modeling?

I am doing Latent Dirichlet Analyses for some research and keep running into a problem. Most lda software requires documents to be in doclines format, meaning a CSV or other delimited file in which each line represents the entirety of a document.…

nlp dataform lda

asked Jan 05 '12 at 22:53

user836015

votes

1 answer

Genism Module attribute error for wrappers

I am going to find the optimal number of topics for LDA. To do this, I used GENSIM as follows : def compute_coherence_values(dictionary, corpus, texts, limit, start=2, step=3): coherence_values = [] model_list = [] for num_topics in…

python gensim lda topic-modeling

asked Apr 14 '21 at 16:35

Tahereh Maghsoudi

Prev 1 2 3

…

78 79 Next