Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet process (HDP)

Software / Libraries

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans

Related Tags :

topicmodels

980 questions

votes

1 answer

topic modeling using keywords for topics

I need to do topic modeling in the following manner: eg: I need to extract 5 topics from a document.The document being a single document.I have the keywords for 5 topics and related to these 5 keywords i need to extract the topics. The keywords for…

lda topic-modeling

asked Oct 04 '14 at 17:51

user2876812

votes

1 answer

Term weighting for original LDA in gensim

I am using the gensim library to apply LDA to a set of documents. Using gensim I can apply LDA to a corpus whatever the term weights are: binary, tf, tf-idf... My question is, what is the term weighting that should be used for the original LDA? If…

python lda topic-modeling gensim

asked Sep 18 '14 at 14:28

papafe

2,959
4
41
72

votes

0 answers

Matrix whose rows have different column names in R

I'd like to have a matrix-like data structure in R, where each row has different column names. Essentially, I'd like almost a list of dictionaries. Consider the following code: x <- c(.5, .3, .2) y <- c(.1, .6, .3) names(x) <- c("foo", "bar",…

r lda topic-modeling

asked Aug 10 '14 at 17:22

sinwav

votes

1 answer

Infer LDA models

I'm new to LDA and topic modeling and I would like to understand the inference mechanism. I would like to apply LDA on activity recognition. Say that I have defined 10 topics composed by a probability distribution of events. for example TOPIC_1 =…

lda topic-modeling

asked Jul 22 '14 at 13:14

gabboshow

5,359
12
48
98

votes

2 answers

Cannot run Mallet TopicModel

I am trying to run Mallet`s topic modelling but got the following error: Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file. Perhaps the 'resources' directories weren't copied into the 'class'…

java topic-modeling mallet

asked Jul 03 '14 at 21:16

Ashkan

votes

1 answer

Mallet java: get probability distribution of a documents collection

I would like to get a single probability distribution for a collection of documents, as I need to be able to use the KL-Divergence, is this possible? In this example: http://mallet.cs.umass.edu/topics-devel.php with the method…

java topic-modeling mallet

asked Jun 13 '14 at 08:46

Enzo

votes

2 answers

Topic Modelling and finding similarity in topics

Problem statement: I have several documents(20k documents). I need to apply Topic modelling to find similar documents and then analyze those similar documents to find how those are different from each other. Q: Could anyone suggest me any Topic…

topic-modeling gensim mallet

asked May 05 '14 at 13:34

user3421622

votes

1 answer

Mallet Api - Get consistent results

I am new to LDA and mallet. I have the following query I tried running Mallet-LDA with the command line and by setting the --random-seed to a fixed value, I was able to get consistent results for multiple runs of the algorithm However, I did try…

lda topic-modeling mallet

asked May 01 '14 at 23:25

Uno

votes

2 answers

how to add words into documents in corpus?

I'm using the tm package to run LDA on my corpus. I have a corpus containing 10,000 documents. rtcorpus.4star <- Corpus(DataframeSource(rt.subset.4star)) ##creates the corpus rtcorpus.4star[[1]] ##accesses the first document I'm trying to write a…

r algorithm data-mining topic-modeling

asked Apr 05 '14 at 23:06

user2303557

votes

1 answer

Issues in using lda for vowpal wabbit

I am trying to use the vowpal wabbit lda model. But I am having very bad results. I think there is something wrong with the process I am doing. I have this vocabulary size of 100000. I run the code like this vw --data train.txt --lda 50 --lda_alpha…

topic-modeling

asked Apr 04 '14 at 19:38

user34790

2,020
7
30
37

votes

0 answers

Labeled Latent Dirichlet Allocation input values

I am doing Tag Prediction and Keyword Extraction on StackExchange posts. I have ~36,000 posts consisting of title, body and tags. I processes them filtering out noisy elements. After this I perform Labeled Latent Dirichlet Allocation (LLDA) which…

java machine-learning text-analysis topic-modeling

asked Mar 24 '14 at 12:39

RazorAlliance192

votes

1 answer

sLDA. How much values response variable may have?

I try to understand in general how sLDA works. In contrast to LDA, it has 'a response variable associated with each document'. Is each document labeled just by one topic in training set or it might be labeled by multiple topics? If it must use just…

lda topic-modeling

asked Mar 18 '14 at 20:19

mariaza

votes

1 answer

Storm and stop words

I am new in storm framework(https://storm.incubator.apache.org/about/integrates.html), I test locally with my code and I think If I remove stop words, it will perform well, but i search on line and I can't see any example that removing stopwords in…

data-mining apache-storm stop-words topic-modeling

asked Mar 12 '14 at 11:29

aysudy

votes

0 answers

Topic Modelling using RPy2

I wish to use LDA in Python using RPy. I have already tried this using gensim package but I still wish to try RPy2 out. While using R I use this code: library(RTextTools) library(topicmodels) library(tm) ...Get Data Here and Store to…

python r rpy2 lda topic-modeling

asked Feb 19 '14 at 20:25

Animesh Pandey

5,900
13
64
130

votes

2 answers

Work-around to clear blank entries in a document term matrix?

I have some r code that I've used in the past to produce topic models. Everything was working fine until I updated all of my r packages in the hopes of fixing a slightly unrelated problem. Now, code which had previously worked seems to be broken…

r tm lda topic-modeling

asked Jan 31 '14 at 04:55

beniam

Prev 1 2 3

…

65 66 Next