Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions

votes

0 answers

How to interpret data extracted with Latent Dirichlet allocation

I'm analyzing some files extracted with LDA, I have learned some basic knowledge about LDA from here http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/ I have three files: topic ids (5x600) 59673453648 64345309472…

lda

asked Aug 10 '15 at 07:04

Lucia

votes

1 answer

Different results of LDA using R(topicmodels)

I am using R topicmodels to train an LDA model from a small corpus, but I find that every time I repeat the same code, it has the different results (different topics and different topic terms) My question is why the same condition and same corpus…

r lda topicmodels

asked Jul 31 '15 at 09:02

Snow

votes

0 answers

Projection of images in fisherspace(LDA)

In Linear Discriminant Analysis algorithm for face recognition, the between class scatter matrix and within class scatter matrix are both of size MxM (M=total number of images, C=number of classes). The fisherspace(matrix with eigenvectors as…

c++ opencv lda

asked Jul 29 '15 at 16:44

KeenLearner

votes

1 answer

R caret LDA error when using resampling

I am running into a problem using LDA through caret with caregorical predictors. For some reason, enabling resampling throws an error that isn't very informative. Has anyone seen this before? Here is a reproducible toy…

r lda r-caret resampling

asked Jul 07 '15 at 15:18

user1642513

votes

1 answer

How to match ngrams for each document in Spark LDA code

I am working with the sample code for LDA in spark given in https://gist.github.com/jkbradley/ab8ae22a8282b2c8ce33 I have a corpus file, where each line is a document, which I have read using val corpus: RDD[String] = sc.textFile("C:/corpus.txt") I…

scala apache-spark lda

asked Jun 22 '15 at 06:41

user3792686

votes

1 answer

Manually Specifying a Topic Model in R

I have a corpus of text with each line in the csv file uniquely specifying a "topic" I am interested in. If I were to run an topic model on this corpus using an LDA or Gibbs method from either the topicmodels package or lda, as expected I would get…

r tm lda topicmodels

asked Jun 15 '15 at 21:47

william

votes

0 answers

LDA Results Errors

So, I am relatively new using Gensim and LDA in general. The problem right now is that when I run LDA on my corpus, the topics' tokens' weights are all 0: 2015-06-15 12:21:12,439 : INFO : topic diff=0.082235, rho=0.250000 2015-06-15 12:21:12,454 :…

machine-learning nlp lda topic-modeling gensim

asked Jun 15 '15 at 17:42

cs123

votes

1 answer

How to find the number of documents (and fraction) per topic using LDA?

I am trying to extract topic from 7 millons of Twitter data. I have assumed each tweet as a document. So, I stored all tweets in a file where each line (or tweet) treated as a document. I used this file as a input file for Mallet api. public static…

twitter lda topic-modeling mallet

asked Jun 13 '15 at 10:41

Khaled

votes

1 answer

LDA with tm package in R using bigrams

I have a csv with every row as a document. I need to perform LDA upon this. I have the following code : library(tm) library(SnowballC) library(topicmodels) library(RWeka) X = read.csv('doc.csv',sep=",",quote="\"",stringsAsFactors=FALSE) corpus <-…

r text-mining tm tf-idf lda

asked Jun 11 '15 at 06:24

dulla

votes

1 answer

LDA generated topics

so I am relatively new working with gensim and LDA, started about two weeks ago and I am having trouble trusting these results. The following are the topics produced by using 11 1-paragraph documents. topic #0 (0.500): 0.059*island + 0.059*world +…

python machine-learning lda topic-modeling gensim

asked Jun 04 '15 at 21:30

cs123

votes

1 answer

Latent Dirichlet Allocation using Gensim on more than one corpus

I have two questions related to the usage of gensim for LDA. 1) How can I create a model using one corpus, save it and perhaps extend it later on another corpus by training the model on it ? Is it possible ? 2) Can LDA be used to classify an unseen…

python lda topic-modeling gensim

asked May 31 '15 at 22:25

Utsav T

1,515
2
24
42

votes

0 answers

How to extract keywords from lots of documents?

I have many documents, over ten thousands (maybe more). I'd like to extract some keywords from each document, let's say 5 keywords from each document, using hadoop. Each document may talk about a unique topic. My current approach is to use Latent…

hadoop mapreduce mahout lda

asked Apr 14 '15 at 23:26

HHH

6,085
20
92
164

votes

3 answers

Tweet analysis, Python error when making dictionary for LDA

I've downloaded Tweets about Amsterdam, in UTF-8 using the Twitter API for python. Now i'm trying to make a dictionary for LDA, using this code (just a part of the code, but this is the part that causes the error): dictionary =…

python dictionary lda gensim

asked Mar 31 '15 at 13:01

mvh

votes

2 answers

Classification of single sentence

I have 4 different categories and I also have around 3000 words which belong to each of these categories. Now if a new sentence comes, I am able to break the sentence into words and get more words related to it. So say for each new sentence I can…

machine-learning nlp nltk lda text-classification

asked Mar 11 '15 at 11:20

rusty

votes

1 answer

how to plot the results of a LDA

There are quite some answers to this question. Not only on stack overflow but through internet. However, none could solve my problem. I have two problems I try to simulate a data for you df <- structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2,…

r lda

asked Feb 28 '15 at 12:49

user4543720

Prev 1 2 3

…

78 79 Next