Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions
0
votes
0 answers

How to interpret data extracted with Latent Dirichlet allocation

I'm analyzing some files extracted with LDA, I have learned some basic knowledge about LDA from here http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/ I have three files: topic ids (5x600) 59673453648 64345309472…
Lucia
  • 615
  • 1
  • 9
  • 16
0
votes
1 answer

Different results of LDA using R(topicmodels)

I am using R topicmodels to train an LDA model from a small corpus, but I find that every time I repeat the same code, it has the different results (different topics and different topic terms) My question is why the same condition and same corpus…
Snow
  • 1
  • 3
0
votes
0 answers

Projection of images in fisherspace(LDA)

In Linear Discriminant Analysis algorithm for face recognition, the between class scatter matrix and within class scatter matrix are both of size MxM (M=total number of images, C=number of classes). The fisherspace(matrix with eigenvectors as…
0
votes
1 answer

R caret LDA error when using resampling

I am running into a problem using LDA through caret with caregorical predictors. For some reason, enabling resampling throws an error that isn't very informative. Has anyone seen this before? Here is a reproducible toy…
user1642513
0
votes
1 answer

How to match ngrams for each document in Spark LDA code

I am working with the sample code for LDA in spark given in https://gist.github.com/jkbradley/ab8ae22a8282b2c8ce33 I have a corpus file, where each line is a document, which I have read using val corpus: RDD[String] = sc.textFile("C:/corpus.txt") I…
0
votes
1 answer

Manually Specifying a Topic Model in R

I have a corpus of text with each line in the csv file uniquely specifying a "topic" I am interested in. If I were to run an topic model on this corpus using an LDA or Gibbs method from either the topicmodels package or lda, as expected I would get…
william
  • 1
  • 1
0
votes
0 answers

LDA Results Errors

So, I am relatively new using Gensim and LDA in general. The problem right now is that when I run LDA on my corpus, the topics' tokens' weights are all 0: 2015-06-15 12:21:12,439 : INFO : topic diff=0.082235, rho=0.250000 2015-06-15 12:21:12,454 :…
cs123
  • 13
  • 2
0
votes
1 answer

How to find the number of documents (and fraction) per topic using LDA?

I am trying to extract topic from 7 millons of Twitter data. I have assumed each tweet as a document. So, I stored all tweets in a file where each line (or tweet) treated as a document. I used this file as a input file for Mallet api. public static…
Khaled
  • 255
  • 4
  • 16
0
votes
1 answer

LDA with tm package in R using bigrams

I have a csv with every row as a document. I need to perform LDA upon this. I have the following code : library(tm) library(SnowballC) library(topicmodels) library(RWeka) X = read.csv('doc.csv',sep=",",quote="\"",stringsAsFactors=FALSE) corpus <-…
dulla
  • 136
  • 1
  • 1
  • 11
0
votes
1 answer

LDA generated topics

so I am relatively new working with gensim and LDA, started about two weeks ago and I am having trouble trusting these results. The following are the topics produced by using 11 1-paragraph documents. topic #0 (0.500): 0.059*island + 0.059*world +…
cs123
  • 13
  • 2
0
votes
1 answer

Latent Dirichlet Allocation using Gensim on more than one corpus

I have two questions related to the usage of gensim for LDA. 1) How can I create a model using one corpus, save it and perhaps extend it later on another corpus by training the model on it ? Is it possible ? 2) Can LDA be used to classify an unseen…
Utsav T
  • 1,515
  • 2
  • 24
  • 42
0
votes
0 answers

How to extract keywords from lots of documents?

I have many documents, over ten thousands (maybe more). I'd like to extract some keywords from each document, let's say 5 keywords from each document, using hadoop. Each document may talk about a unique topic. My current approach is to use Latent…
HHH
  • 6,085
  • 20
  • 92
  • 164
0
votes
3 answers

Tweet analysis, Python error when making dictionary for LDA

I've downloaded Tweets about Amsterdam, in UTF-8 using the Twitter API for python. Now i'm trying to make a dictionary for LDA, using this code (just a part of the code, but this is the part that causes the error): dictionary =…
mvh
  • 189
  • 1
  • 2
  • 20
0
votes
2 answers

Classification of single sentence

I have 4 different categories and I also have around 3000 words which belong to each of these categories. Now if a new sentence comes, I am able to break the sentence into words and get more words related to it. So say for each new sentence I can…
rusty
  • 652
  • 7
  • 21
0
votes
1 answer

how to plot the results of a LDA

There are quite some answers to this question. Not only on stack overflow but through internet. However, none could solve my problem. I have two problems I try to simulate a data for you df <- structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2,…
user4543720