Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions

votes

0 answers

OnlineLDA in Spark: can I update the model?

In Spark 2.0.1 (pyspark), I want to learn an LDA with the online optimizer. Does this version of the optimizer makes possible the update of the model each day (for example)? I'm not sure I understand the meaning of online here and its implications.…

apache-spark machine-learning lda

asked Feb 08 '17 at 20:49

Patrick

2,577
6
30
53

votes

1 answer

Python Latent Dirichlet Allocation Stopped_tokens Error

my code is based off of the code at: https://rstudio-pubs-static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html I can run my program with lower number of files, however when I start to get to larger file numbers around 1000, then I get…

python unicode lda

asked Jan 31 '17 at 01:42

Reighr Doughty

votes

2 answers

JavaLDAExample doesn't work

I am new in Spark and I am using spark-2.1.0-bin-hadoop2.7. I have checked it's WordsCount sample and it works fine, but JavaLDAExample does not. I checked their source codes here. WordsCount requires an url as parameter for it's data and I have…

java hadoop apache-spark apache-spark-mllib lda

asked Jan 29 '17 at 22:41

amir golkar

votes

1 answer

different approach for document similarity(LDA, LSA, cosine)

I have set of short documents(1 or 2 paragraph each). I have used three different approaches for document similarity: - simple cosine similarity on tfidf matrix - applying LDA on the whole corpus and then using the LDA model to create the vector for…

text similarity lda trigonometry lsa

asked Jan 05 '17 at 20:38

Eli

votes

0 answers

Input for spark.lda

I am trying to do LDA Topic Analysis using SparkR. I am not sure what is the format of the input file. I have a cleaned text file (I am working with the 20 Newsgroup) which I created in R. I save it as CSV, and then read it with read.df to have a…

r lda sparkr

asked Dec 30 '16 at 17:59

Andres

votes

1 answer

Is it possible to find the posterior probability of topics generated with LDAvis occurring in a given document? How, if so?

As may or may not be evident from the question, I'm pretty new to R and I could do with a bit of help on this. When creating topic models, I've experimented with LDA and LDAvis - code in (A) and (B) below. LDA in (A) allows me to find the posterior…

r probability lda text-analysis topicmodels

asked Dec 28 '16 at 10:27

Gazzer

votes

1 answer

install package lda and pyprind

I use Python 3 on Jupyter from Anaconda 2.3.0. I have installed LDA (Latent Dirichlet Analysis) https://pypi.python.org/pypi/lda#downloads and pyprind using pip install lda and pip install pyprind. Seems installation is succesful, but…

python installation anaconda lda

asked Dec 16 '16 at 23:22

TripleH

votes

1 answer

Downloading the image produced by LDAvis library

I am using the topic visualization library LDAvis: ## visualization of the topics import pyLDAvis import pyLDAvis.gensim pyLDAvis.enable_notebook() pyLDAvis.gensim.prepare(ldamodel, corpus, dictionary) which produces an image of the Principal…

python ipython gensim lda

asked Dec 12 '16 at 12:55

Economist_Ayahuasca

1,648
24
33

votes

1 answer

Effectively turning strings into unicode for python 2.7

I'm following a turtorial on LDA and encountering a problem since the turtorial is made in python 3 and I'm working in 2.7 (the turtorial claims to work in both). As far as I understand I need to turn strings into unicode in python 2.x before I can…

python lda python-unicode isnumeric

asked Dec 07 '16 at 16:16

WiggyStardust

votes

1 answer

caret dummy-vars exclude target

How can I use dummy vars in caret without destroying my target variable? set.seed(5) data <- ISLR::OJ data<-na.omit(data) dummies <- dummyVars( Purchase ~ ., data = data) data2 <- predict(dummies, newdata = data) split_factor = 0.5 n_samples =…

r r-caret lda

asked Nov 18 '16 at 13:19

Georg Heiler

16,916
36
162
292

votes

1 answer

Generative model and inference

I was looking at the hLDA model here: https://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nested-chinese-restaurant-process.pdf I have questions on how the generative model works. What will be the output of the generative model and…

lda topic-modeling

asked Oct 30 '16 at 20:43

nak15

votes

1 answer

How to use Topic Model (LDA) output to match and retrieve new, same-topic documents

I am using a LDA model on a corpus to learn the topics covered in it. I am using the gensim package (e.g., gensim.models.ldamodel.LdaModel); can easily use other versions of LDA if necessary. My question is what is the most efficient way to use the…

text lda topic-modeling

asked Oct 25 '16 at 18:27

iseifs

votes

1 answer

PyLdaVis : TypeError: cannot sort an Index object in-place, use sort_values instead

I am trying to visualize LDA topics in Python using PyLDAVis but I can't seem to get it right. My model has a vocab size of 150K words and about 16 Million tokens were taken to train it. I am doing it outside of an iPython notebook and this is the…

python visualization lda gensim topic-modeling

asked Oct 09 '16 at 20:56

silent_dev

1,566
3
20
45

votes

0 answers

Identifying interest / topic from text

I am attempting to build a model that will attempt to identify the interest category / topic of supplied text. For example: Shop for Bridal Wedding Sarees from our exhausting variety of beautiful and designer sarees. Get great deals, quality…

python nltk lda gensim nltk-trainer

asked Oct 01 '16 at 12:55

GBD

15,847
2
46
50

votes

0 answers

Spark LDA model prediction on new documents

I am using Spark MLlib to fetch out topic from the documents. I have run below line and got a LDAModel. LDAModel ldaModel = new LDA().setK(4).run(corpus); Now I have another document and want to check what all topics it has. How can that be done…

apache-spark machine-learning apache-spark-mllib lda

asked Sep 24 '16 at 15:34

Amit Kumar

2,685
2
37
72

Prev 1 2 3

…

78 79 Next