Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions
0
votes
0 answers

OnlineLDA in Spark: can I update the model?

In Spark 2.0.1 (pyspark), I want to learn an LDA with the online optimizer. Does this version of the optimizer makes possible the update of the model each day (for example)? I'm not sure I understand the meaning of online here and its implications.…
Patrick
  • 2,577
  • 6
  • 30
  • 53
0
votes
1 answer

Python Latent Dirichlet Allocation Stopped_tokens Error

my code is based off of the code at: https://rstudio-pubs-static.s3.amazonaws.com/79360_850b2a69980c4488b1db95987a24867a.html I can run my program with lower number of files, however when I start to get to larger file numbers around 1000, then I get…
0
votes
2 answers

JavaLDAExample doesn't work

I am new in Spark and I am using spark-2.1.0-bin-hadoop2.7. I have checked it's WordsCount sample and it works fine, but JavaLDAExample does not. I checked their source codes here. WordsCount requires an url as parameter for it's data and I have…
0
votes
1 answer

different approach for document similarity(LDA, LSA, cosine)

I have set of short documents(1 or 2 paragraph each). I have used three different approaches for document similarity: - simple cosine similarity on tfidf matrix - applying LDA on the whole corpus and then using the LDA model to create the vector for…
Eli
  • 123
  • 1
  • 12
0
votes
0 answers

Input for spark.lda

I am trying to do LDA Topic Analysis using SparkR. I am not sure what is the format of the input file. I have a cleaned text file (I am working with the 20 Newsgroup) which I created in R. I save it as CSV, and then read it with read.df to have a…
Andres
  • 281
  • 2
  • 13
0
votes
1 answer

Is it possible to find the posterior probability of topics generated with LDAvis occurring in a given document? How, if so?

As may or may not be evident from the question, I'm pretty new to R and I could do with a bit of help on this. When creating topic models, I've experimented with LDA and LDAvis - code in (A) and (B) below. LDA in (A) allows me to find the posterior…
Gazzer
  • 1
  • 1
0
votes
1 answer

install package lda and pyprind

I use Python 3 on Jupyter from Anaconda 2.3.0. I have installed LDA (Latent Dirichlet Analysis) https://pypi.python.org/pypi/lda#downloads and pyprind using pip install lda and pip install pyprind. Seems installation is succesful, but…
TripleH
  • 447
  • 7
  • 16
0
votes
1 answer

Downloading the image produced by LDAvis library

I am using the topic visualization library LDAvis: ## visualization of the topics import pyLDAvis import pyLDAvis.gensim pyLDAvis.enable_notebook() pyLDAvis.gensim.prepare(ldamodel, corpus, dictionary) which produces an image of the Principal…
Economist_Ayahuasca
  • 1,648
  • 24
  • 33
0
votes
1 answer

Effectively turning strings into unicode for python 2.7

I'm following a turtorial on LDA and encountering a problem since the turtorial is made in python 3 and I'm working in 2.7 (the turtorial claims to work in both). As far as I understand I need to turn strings into unicode in python 2.x before I can…
WiggyStardust
  • 182
  • 1
  • 10
0
votes
1 answer

caret dummy-vars exclude target

How can I use dummy vars in caret without destroying my target variable? set.seed(5) data <- ISLR::OJ data<-na.omit(data) dummies <- dummyVars( Purchase ~ ., data = data) data2 <- predict(dummies, newdata = data) split_factor = 0.5 n_samples =…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
0
votes
1 answer

Generative model and inference

I was looking at the hLDA model here: https://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nested-chinese-restaurant-process.pdf I have questions on how the generative model works. What will be the output of the generative model and…
nak15
  • 11
  • 3
0
votes
1 answer

How to use Topic Model (LDA) output to match and retrieve new, same-topic documents

I am using a LDA model on a corpus to learn the topics covered in it. I am using the gensim package (e.g., gensim.models.ldamodel.LdaModel); can easily use other versions of LDA if necessary. My question is what is the most efficient way to use the…
iseifs
  • 23
  • 3
0
votes
1 answer

PyLdaVis : TypeError: cannot sort an Index object in-place, use sort_values instead

I am trying to visualize LDA topics in Python using PyLDAVis but I can't seem to get it right. My model has a vocab size of 150K words and about 16 Million tokens were taken to train it. I am doing it outside of an iPython notebook and this is the…
silent_dev
  • 1,566
  • 3
  • 20
  • 45
0
votes
0 answers

Identifying interest / topic from text

I am attempting to build a model that will attempt to identify the interest category / topic of supplied text. For example: Shop for Bridal Wedding Sarees from our exhausting variety of beautiful and designer sarees. Get great deals, quality…
GBD
  • 15,847
  • 2
  • 46
  • 50
0
votes
0 answers

Spark LDA model prediction on new documents

I am using Spark MLlib to fetch out topic from the documents. I have run below line and got a LDAModel. LDAModel ldaModel = new LDA().setK(4).run(corpus); Now I have another document and want to check what all topics it has. How can that be done…
Amit Kumar
  • 2,685
  • 2
  • 37
  • 72