Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions

votes

0 answers

text mining in R- input is an Excel file with each row being one document

I am new to R. I have a CSV file that includes 15000 rows of text, each row belongs to one person. I want to do Latent Dirichlet Allocation on it. But, first I need to create a term document matrix. However, I don't know how to make R to treat each…

r text-mining lda

asked Mar 18 '16 at 14:34

Monica Muller

votes

1 answer

LDA with topicmodels package for R, how do I get the topic probability for each term?

I'm using the topicmodels package for LDA. I would like to create a visualization that shows how related or non-related each topic is. I envision a cluster of words that are unique to topic 1, but with a few keywords that are shared connecting to…

r lda topicmodels

asked Jan 25 '16 at 22:16

lmcshane

1,074
4
14
27

votes

1 answer

R in Windows cannot handle some characters

I performed LDA in Linux and didn't get characters like "ø" in topic 2. However, when run in Windows, they show. Does anyone know how to deal with this? I used packages quanteda and topicmodels. > terms(LDAModel1,5) Topic 1 Topic 2 [1,] "car" …

r windows lda topicmodels quanteda

asked Jan 13 '16 at 03:17

user1569341

votes

1 answer

Collapsed gibbs sampling in R package lda

I’ve been trying to modify parts the R package lda, specifically the slda.em function. At some point, the C function "collapsedGibbsSampler” gets called in slda.collapsed.gibbs.sampler. Does anyone have the C code for that function? I've looked…

c r lda topic-modeling

asked Jan 12 '16 at 19:30

user2592729

votes

1 answer

How do i identify which features are being selected with LDA?

I have run LDA with MATLAB using the fitcdiscr function and predict. I have a feeling there may be some bugs in my code however and as a sanity check would like to identify which features are being most heavily weighted in the classification. Can…

matlab machine-learning lda

asked Jan 08 '16 at 10:39

JP1

votes

0 answers

How does spark LDA handle non-integer token counts (e.g. TF-IDF)

I have been running a series of topic modeling experiments in Spark, varying the number of topics. So, given an RDD docsWithFeatures, I'm doing something like this: for (n_topics <- Range(65,301,5) ){ val s = n_topics.toString val lda = new…

scala apache-spark lda

asked Dec 07 '15 at 20:02

moustachio

2,924
3
36
68

votes

2 answers

LDA in Spark 1.3.1. Converting raw data into Term Document Matrix?

I'm trying out LDA with Spark 1.3.1 in Java and got this error: Error: application failed with exception org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in…

java apache-spark lda

asked Nov 11 '15 at 21:30

user1569341

votes

1 answer

topic proportions in my corpus?

Thanks for reading and taking the time to think about and respond to this. I am using Gensim's wrapper for Mallet (ldamallet.py), and it works like a charm. I need to get the topic proportions for my corpus (over all my documents) and I do not know…

lda gensim topic-modeling mallet

asked Nov 09 '15 at 17:46

JRun

votes

1 answer

the accuracy of LDA predict for new documents with Spark

I'm work with Mllib of Spark, and now is doing something with LDA. But when I use the code provided by Spark(see bellow) to predict a Doc used in training the model, the result(document-topics) of predict is at opposite poles with the result of…

scala apache-spark lda

asked Nov 04 '15 at 09:05

Carlos

votes

1 answer

How to inference the topic distribution of a new document with LDA/pLSA?

I have a question when using topic models like pLSA/LDA: how to inference the topic distribution of a new document after we got the distribution for each words in each topics? I have tried "fold-in" Gibbs Sampling when using LDA, but when the unseen…

lda topic-modeling

asked Nov 03 '15 at 09:37

starays

votes

1 answer

Predicting topics with LDA

I am trying to extract topic assignments from a fit I build with R's 'lda' package. I created a fit: fit <- lda.collapsed.gibbs.sampler(documents = documents, K = K, vocab = vocab, num.iterations = G, alpha = alpha, eta = eta, initial = NULL, …

r lda topic-modeling topicmodels

asked Sep 29 '15 at 11:52

Sylvia

votes

2 answers

Topic model as a dimension reduction method for text mining -- what to do next?

My understanding of the work flow is to run LDA -> Extract keywards (e.g. the top few words for each topics), and hence reduce dimension -> some subsequent analysis. My question is, if my overall purpose is to give topic to articles in an…

machine-learning nlp text-mining lda topic-modeling

asked Sep 27 '15 at 02:20

nobody

votes

1 answer

how to remove numbers and symbols from output of LDA while using Gensim package?

how to remove these numbers from output of LDA while using Gensim package? 2015-08-25 15:26:20,439 : INFO : topic #8 (0.100): 0.038*watch + 0.020*water + 0.014*strap + 0.011*analog + 0.011*resistance + 0.010*atm + 0.010*coloured + 0.010*timepiece +…

python lda gensim

asked Aug 25 '15 at 10:38

Thomas N T

votes

1 answer

Generating documents from LDA topic model

I'm learning a topic model from a set of documents and that's working well. But I'm wondering if any existing system will actually generate new documents from the topics and words in the model. Ie. say I want a new document of topic 0, will any of…

modeling lda documents mallet generative

asked Aug 21 '15 at 23:01

ten

votes

0 answers

DocumentTermMatrix() return 0 terms in tm package

I have an object like that: str(apps) chr [1:17517] "35 44 33 40 33 40 44 38 33 37 37" ... In each row, the number is separated by space. corpus<-Corpus(VectorSource(apps)) dtm<-DocumentTermMatrix(corpus) str(dtm) List of 6 $ i : int(0) $…

r tm lda topicmodels

asked Aug 11 '15 at 02:46

ysfseu

Prev 1 2 3

…

78 79 Next