Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions

votes

1 answer

Improve Document- Topic Probability in LDA

I'm trying to classify the IT support tickets into relevant topics using LDA in R. My corpus has: 5,550 documents and 1882 terms. I started with 12,000 terms, but after removing common stop words and other noise words I've landed with 1800 odd…

r algorithm lda topic-modeling

asked Sep 23 '16 at 11:38

Puneet

votes

1 answer

Stipulation of "Good"/"Bad"-Cases in an LDA Model (Using gensim in Python)

I am trying to analyze news snippets in order to identify crisis periods. To do so, I have already downloaded news articles over the past 7 years and have those available. Now, I am applying a LDA (Latent Dirichlet Allocation) model on this…

python python-2.7 lda gensim

asked Aug 09 '16 at 13:43

MKay

votes

1 answer

How to use LDA/Bi clustering/K-mean to conduct temporal clustering R?

I have a dataset like this, which contains about 1000 passenger IDs and their travel frequency between Temporal 1 and Temporal 12 from Sunday to Saturday. Is that possiable to cluster this dataset by using bi clustering? and how to do it. ID T1 …

r cluster-analysis k-means lda bigdata

asked Jul 20 '16 at 09:15

Meixu Chen

votes

2 answers

How to print top ten topics using Gensim?

In the official explanation, there is no natural ordering between the topics in LDA. As for the method show_topics(), if it returned num_topics <= self.num_topics subset of all topics is therefore arbitrary and may change between two LDA training…

python lda gensim topic-modeling

asked Jul 18 '16 at 16:53

Xinyi Zhang

votes

0 answers

Stemming of Text using NLTK in Python

I am trying to implement LDA upon a set of tweets treated as a document. While preprocessing, in the stemming part it shows error as : UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128) My code is as…

python json lda

asked Jun 21 '16 at 04:26

Priyanshu Kedia

votes

1 answer

Seeding words into an LDA topic model in R

I have a dataset of news articles that have been collected based on the criteria that they use the term "euroscepticism" or "eurosceptic". I have been running topic models using the lda package (with dfm matrices built in quanteda) in order to…

r lda quanteda topicmodels

asked Jun 09 '16 at 13:02

Michael Bossetta

votes

2 answers

No such file or directory error even the file exists in Java

I am a newbie to Java and I wish to run the library JGibbLDA. I did as required by the document, enter the root directory of JGibbLDA-v.1.0 and input the command: java -mx512M -cp bin:lib/args4j-2.0.6.jar jgibblda.LDA -est -alpha 0.5 -beta 0.1…

java lda

asked May 30 '16 at 04:36

southdoor

votes

1 answer

computing the weight of LDA topic for all the documents in the corpus

I computed my LDA model, I retrieved my topics and now I am looking for the way to compute the weight/percentage of each topic on the corpus. Surprisingly I cannot find the way to do this, so far my code looks like: ## Libraries to download from…

python lda gensim corpus

asked May 27 '16 at 15:40

Economist_Ayahuasca

1,648
24
33

votes

0 answers

How to use previously generated topic-word distribution matrix for the new LDA topic generation process?

Let's say that we have executed the LDA topic generation process (with Gibbs sampling) once. Now for the next round of LDA topic generation, how to make use of the already existing topic matrix? Does any library support this kind of feature?

lda topic-modeling

asked May 18 '16 at 21:35

genonymous

1,598
3
18
27

votes

2 answers

visualize latent dirichlet allocation results

I'm trying to use Latent Dirichlet Allocation LDA from genism library for Python. Is there any way to display results of the algorithm over training set in a form of a graph? Maybe with Venn's diagrams, or some chars?

graph visualization allocation lda dirichlet

asked May 14 '16 at 13:16

striki

votes

2 answers

Latent Dirichlet allocation(LDA) performance by limiting word size for Corpus Documents

I have been generating topics with yelp data set of customer reviews by using Latent Dirichlet allocation(LDA) in python(gensim package). While generating tokens, I am selecting only the words having length >= 3 from the reviews( By using…

python tokenize lda gensim corpus

asked Apr 17 '16 at 06:18

triandicAnt

1,328
2
15
40

votes

0 answers

Memory error in python while adding doc to TermDocumentMatrix

tdm = textmining.TermDocumentMatrix() with open('C:/Users/java.txt') as f: for line in f: tdm.add_doc(line) temp = list(tdm.rows(cutoff=1)) vocab = tuple(temp[0]) X = np.array(temp[1:]) …

python text-mining lda topic-modeling

asked Apr 06 '16 at 15:37

Tejasvi Rao

votes

1 answer

Memory error in python using numpy array

I am getting the following error for this code: model = lda.LDA(n_topics=15, n_iter=50, random_state=1) model.fit(X) topic_word = model.topic_word_ print("type(topic_word): {}".format(type(topic_word))) print("shape:…

python numpy text-mining lda topic-modeling

asked Apr 06 '16 at 14:53

Tejasvi Rao

votes

2 answers

Convert topicmodels output to JSON

I use the following function to convert the topicmodels output to JSON output to use in ldavis. topicmodels_json_ldavis <- function(fitted, corpus, doc_term){ ## Required packages library(topicmodels) library(dplyr) …

text-mining lda topic-modeling

asked Mar 31 '16 at 20:07

Monica Muller

votes

3 answers

Gensim - LDA create a document- topic matrix

I am working on a project where I need to apply topic modelling to a set of documents and I need to create a matrix : DT , a D × T matrix, where D is the number of documents and T is the number of topics. DT(ij) contains the number of times a word…

python lda gensim topic-modeling

asked Mar 24 '16 at 02:14

swati saoji

1,987
5
25
35

Prev 1 2 3

…

78 79 Next