Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions

votes

1 answer

Getting topic-word distribution from LDA in scikit learn

I was wondering if there is a method in the LDA implementation of scikit learn that returns the topic-word distribution. Like the genism show_topics() method. I checked the documentation but didn't find anything.

python scikit-learn lda

asked May 26 '17 at 18:58

Niro

votes

1 answer

How to monitor convergence of Gensim LDA model?

I can't seem to find it or probably my knowledge on statistics and its terms are the problem here but I want to achieve something similar to the graph found on the bottom page of the LDA lib from PyPI and observe the uniformity/convergence of the…

python lda gensim convergence

asked Jun 01 '16 at 13:50

ZeferiniX

votes

3 answers

Extract document-topic matrix from Pyspark LDA Model

I have successfully trained an LDA model in spark, via the Python API: from pyspark.mllib.clustering import LDA model=LDA.train(corpus,k=10) This works completely fine, but I now need the document-topic matrix for the LDA model, but as far as I can…

python apache-spark pyspark lda

asked Oct 12 '15 at 02:37

moustachio

2,924
3
36
68

votes

2 answers

How to get a complete topic distribution for a document using gensim LDA?

When I train my lda model as such dictionary = corpora.Dictionary(data) corpus = [dictionary.doc2bow(doc) for doc in data] num_cores = multiprocessing.cpu_count() num_topics = 50 lda = LdaMulticore(corpus, num_topics=num_topics, id2word=dictionary,…

python gensim lda

asked Jul 25 '17 at 18:21

PyRsquared

6,970
11
50
86

votes

1 answer

Spark LDA consumes too much memory

I'm trying to use spark mllib lda to summarize my document corpus. My problem setting is as bellow. about 100,000 documents about 400,000 unique words 100 cluster I have 16 servers (each has 20 cores and 128GB memory). When I execute LDA with…

apache-spark apache-spark-mllib lda

asked Mar 14 '16 at 03:59

Du Shiqiao

votes

1 answer

How to interpret LDA components (using sklearn)?

I used Latent Dirichlet Allocation (sklearn implementation) to analyse about 500 scientific article-abstracts and I got topics containing most important words (in german language). My problem is to interpret these values associated with the most…

python-3.x scikit-learn lda topic-modeling

asked Feb 01 '16 at 20:53

LSz

votes

3 answers

Evaluation of topic modeling: How to understand a coherence value / c_v of 0.4, is it good or bad?

I need to know whether coherence score of 0.4 is good or bad? I use LDA as topic modelling algorithm. What is the average coherence score in this context?

data-science lda topic-modeling

asked Feb 19 '19 at 09:23

User Mohamed

votes

1 answer

Understanding parameters in Gensim LDA Model

I am using gensim.models.ldamodel.LdaModel to perform LDA, but I do not understand some of the parameters and cannot find explanations in the documentation. If someone has experience working with this, I would love further details of what these…

python parameters gensim lda

asked Jun 11 '18 at 20:30

Jane Sully

3,137
10
48
87

votes

1 answer

Latent Dirichlet allocation (LDA) in Spark

I am trying to write a progrma in Spark for carrying out Latent Dirichlet allocation (LDA). This Spark documentation page provides a nice example for perfroming LDA on the sample data. Below is the program from pyspark.mllib.clustering import LDA,…

python pyspark lda

asked Feb 05 '17 at 10:54

prashanth

4,197
4
25
42

votes

1 answer

Spark MLlib LDA, how to infer the topics distribution of a new unseen document?

i am interested in applying LDA topic modelling using Spark MLlib. I have checked the code and the explanations in here but I couldn't find how to use the model then to find the topic distribution in a new unseen document.

apache-spark lda apache-spark-mllib topic-modeling

asked Sep 16 '15 at 09:22

Rami

8,044
18
66
108

votes

7 answers

Hierarchical Dirichlet Process Gensim topic number independent of corpus size

I am using the Gensim HDP module on a set of documents. >>> hdp = models.HdpModel(corpusB, id2word=dictionaryB) >>> topics = hdp.print_topics(topics=-1, topn=20) >>> len(topics) 150 >>> hdp = models.HdpModel(corpusA, id2word=dictionaryA) >>> topics…

python nlp lda gensim

asked Jul 21 '15 at 15:34

Sam Weisenthal

2,791
9
28
66

votes

3 answers

Supervised Latent Dirichlet Allocation for Document Classification?

I have a bunch of already human-classified documents in some groups. Is there a modified version of lda which I can use to train a model and then later classify unknown documents with it?

machine-learning nlp classification document-classification lda

asked Nov 25 '12 at 20:12

snøreven

1,904
2
19
39

votes

2 answers

pyLDAvis visualization of pyspark generated LDA model

Does anyone have an example of data visualization of an LDA model trained using the PySpark library (specifically using pyLDAvis)? I've seen a lot of examples for GenSim and other libraries but not PySpark. Specifically I'm wondering what to pass…

python apache-spark pyspark lda

asked Jan 24 '17 at 03:57

igodfried

votes

2 answers

Gensim LDA topic assignment

I am hoping to assign each document to one topic using LDA. Now I realise that what you get is a distribution over topics from LDA. However as you see from the last line below I assign it to the most probable topic. My question is this. I have to…

gensim lda topic-modeling

asked Oct 11 '16 at 03:07

sachinruk

9,571
12
55
86

votes

3 answers

ImportError: No module named 'sklearn.lda'

When I run classifier.py in the openface demos directory using: classifier.py train ./generated-embeddings/ I get the following error message: --> from sklearn.lda import LDA ModuleNotFoundError: No module named 'sklearn.lda'. I think to have…

python machine-learning scikit-learn lda

asked Oct 16 '17 at 16:41

mauroV8F5

Prev 1

…

78 79 Next