Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions

votes

1 answer

Can we use a self made corpus for training for LDA using gensim?

I have to apply LDA (Latent Dirichlet Allocation) to get the possible topics from a data base of 20,000 documents that I collected. How can I use these documents rather than the other corpus available like the Brown Corpus or English Wikipedia as…

python lda gensim

asked Apr 27 '13 at 16:05

Animesh Pandey

5,900
13
64
130

votes

2 answers

Gensim LDA Coherence Score Nan

I created a Gensim LDA Model as shown in this tutorial: https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/ lda_model = gensim.models.LdaMulticore(data_df['bow_corpus'], num_topics=10, id2word=dictionary, random_state=100,…

python machine-learning gensim lda topic-modeling

asked Feb 16 '20 at 08:03

Ramsha Siddiqui

votes

1 answer

Latent Dirichlet allocation (LDA) in Spark - replicate model

I want to save the LDA model from pyspark ml-clustering package and apply the model to the training & test data-set after saving. However results diverge despite setting a seed. My code is the following: 1) Import packages from…

apache-spark pyspark lda

asked Feb 04 '19 at 12:17

raffaelo92

votes

2 answers

python scikit learn, get documents per topic in LDA

I am doing an LDA on a text data, using the example here: My question is: How can I know which documents correspond to which topic? In other words, what are the documents talking about topic 1 for example? Here are my steps: n_features =…

python machine-learning lda topic-modeling

asked Jul 17 '17 at 13:17

passion

1,000
6
20
47

votes

1 answer

Is there any way to match Gensim LDA output with topics in pyLDAvis graph?

I need to process the topics in the LDA output (lda.show_topics(num_topics=-1, num_words=100...) and then compare what I do with the pyLDAvis graph but the topic numbers are differently numbered. Is there a way I can match them?

python-3.x gensim lda topic-modeling

asked Apr 06 '17 at 15:52

m.khalil

votes

1 answer

Google Cloud Dataproc configuration issues

I've been encountering various issues in some Spark LDA topic modeling (mainly disassociation errors at seemingly random intervals) I've been running, which I think mainly have to do with insufficient memory allocation on my executors. This would…

apache-spark google-cloud-platform lda google-cloud-dataproc

asked Dec 07 '15 at 18:32

moustachio

2,924
3
36
68

votes

1 answer

Gensim get topic for a document (seen document)

I know that after training the lda model for gensim, we can get the topic for an unseen document by: lda = LdaModel(corpus, num_topics=10) doc_lda = lda[doc_bow] But how about the documents that are already used for training? I mean is there a way…

python lda gensim

asked Apr 12 '14 at 15:59

CentAu

10,660
15
59
85

votes

3 answers

How to print out the full distribution of words in an LDA topic in gensim?

The lda.show_topics module from the following code only prints the distribution of the top 10 words for each topic, how do i print out the full distribution of all the words in the corpus? from gensim import corpora, models documents = ["Human…

python lda topic-modeling gensim

asked Jul 15 '13 at 20:06

alvas

115,346
109
446
738

votes

3 answers

WordCloud Only Supported for TrueType fonts

I am trying to generate a word cloud using the WordCloud module in Python, however I see the following error whenever I call .generate Traceback (most recent call last): File "/mnt/6db3226b-5f96-4257-980d-bb8ec1dad8e7/test.py", line 4, in…

python python-imaging-library visualization lda truetype

asked Apr 28 '23 at 12:16

Matthew

votes

2 answers

pyLDAvis visualization from gensim not displaying the result in google colab

import pyLDAvis.gensim # Visualize the topics pyLDAvis.enable_notebook() vis = pyLDAvis.gensim.prepare(lda_model, corpus, id2word) vis The above code displayed the visualization of LDA model in google colab but then after reopening the notebook it…

visualization gensim lda pyldavis

asked Feb 08 '21 at 05:02

Ravi Prajapati

votes

3 answers

Meaning of bar width for pyLDAvis for lambda = 0

Not sure if this is the right forum but I was wondering if anyone understands how to interpret the width of the red vs. blue bars on the right-hand side of pyLDAvis plots when lambda = 0 (see…

python lda topic-modeling

asked Jun 06 '18 at 17:56

user3490622

votes

3 answers

How to get topic associated with each document using pyspark(2.1.0) LdA?

I am using LDAModel of pyspark to get topics from corpus. My goal is to find topics associated with each document. For that purpose I tried to set topicDistributionCol as per Docs. Since I am new to this, I am not sure what is the purpose of this…

pyspark data-mining lda topic-modeling data-processing

asked Jan 31 '17 at 13:09

Hiren patel

votes

1 answer

Using LDA(topic model) : the distrubution of each topic over words are similar and "flat"

Latent Dirichlet Allocation(LDA) is a topic model to find latent variable (topics) underlying a bunch of documents. I'm using python gensim package and having two problems: I printed out the most frequent words for each topic (I tried 10,20,50…

python lda topic-modeling gensim

asked Feb 23 '15 at 15:34

Ruby

votes

1 answer

How can I speed up a topic model in R?

Background I am trying to fit a topic model with the following data and specification documents=140 000, words = 3000, and topics = 15. I am using the package topicmodels in R (3.1.2) on a Windows 7 machine (ram 24 GB, 8 cores). My problem is that…

r machine-learning lda topic-modeling unsupervised-learning

asked Jan 26 '15 at 16:52

Adel

votes

2 answers

Bug in scikit-learns LDA function - plots shows non-zero correlation

I did some LDA using scikit-learn's LDA function and I noticed in my resulting plots that there is a non-zero correlation between LDs. from sklearn.lda import LDA sklearn_lda = LDA(n_components=2) transf_lda = sklearn_lda.fit_transform(X, y) This…

python r scikit-learn lda

asked Jul 28 '14 at 20:06

user2489252

Prev 1 2 3

…

78 79 Next