Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

  • Latent Dirichlet Allocation (LDA)
  • Hierarchical Dirichlet process (HDP)

Software / Libraries

Related Tags :

980 questions
0
votes
1 answer

topic modeling using keywords for topics

I need to do topic modeling in the following manner: eg: I need to extract 5 topics from a document.The document being a single document.I have the keywords for 5 topics and related to these 5 keywords i need to extract the topics. The keywords for…
user2876812
  • 326
  • 1
  • 4
  • 15
0
votes
1 answer

Term weighting for original LDA in gensim

I am using the gensim library to apply LDA to a set of documents. Using gensim I can apply LDA to a corpus whatever the term weights are: binary, tf, tf-idf... My question is, what is the term weighting that should be used for the original LDA? If…
papafe
  • 2,959
  • 4
  • 41
  • 72
0
votes
0 answers

Matrix whose rows have different column names in R

I'd like to have a matrix-like data structure in R, where each row has different column names. Essentially, I'd like almost a list of dictionaries. Consider the following code: x <- c(.5, .3, .2) y <- c(.1, .6, .3) names(x) <- c("foo", "bar",…
sinwav
  • 724
  • 1
  • 7
  • 20
0
votes
1 answer

Infer LDA models

I'm new to LDA and topic modeling and I would like to understand the inference mechanism. I would like to apply LDA on activity recognition. Say that I have defined 10 topics composed by a probability distribution of events. for example TOPIC_1 =…
gabboshow
  • 5,359
  • 12
  • 48
  • 98
0
votes
2 answers

Cannot run Mallet TopicModel

I am trying to run Mallet`s topic modelling but got the following error: Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file. Perhaps the 'resources' directories weren't copied into the 'class'…
Ashkan
  • 159
  • 2
  • 10
0
votes
1 answer

Mallet java: get probability distribution of a documents collection

I would like to get a single probability distribution for a collection of documents, as I need to be able to use the KL-Divergence, is this possible? In this example: http://mallet.cs.umass.edu/topics-devel.php with the method…
Enzo
  • 597
  • 1
  • 8
  • 22
0
votes
2 answers

Topic Modelling and finding similarity in topics

Problem statement: I have several documents(20k documents). I need to apply Topic modelling to find similar documents and then analyze those similar documents to find how those are different from each other. Q: Could anyone suggest me any Topic…
user3421622
  • 89
  • 1
  • 10
0
votes
1 answer

Mallet Api - Get consistent results

I am new to LDA and mallet. I have the following query I tried running Mallet-LDA with the command line and by setting the --random-seed to a fixed value, I was able to get consistent results for multiple runs of the algorithm However, I did try…
Uno
  • 533
  • 10
  • 24
0
votes
2 answers

how to add words into documents in corpus?

I'm using the tm package to run LDA on my corpus. I have a corpus containing 10,000 documents. rtcorpus.4star <- Corpus(DataframeSource(rt.subset.4star)) ##creates the corpus rtcorpus.4star[[1]] ##accesses the first document I'm trying to write a…
user2303557
  • 225
  • 1
  • 6
  • 15
0
votes
1 answer

Issues in using lda for vowpal wabbit

I am trying to use the vowpal wabbit lda model. But I am having very bad results. I think there is something wrong with the process I am doing. I have this vocabulary size of 100000. I run the code like this vw --data train.txt --lda 50 --lda_alpha…
user34790
  • 2,020
  • 7
  • 30
  • 37
0
votes
0 answers

Labeled Latent Dirichlet Allocation input values

I am doing Tag Prediction and Keyword Extraction on StackExchange posts. I have ~36,000 posts consisting of title, body and tags. I processes them filtering out noisy elements. After this I perform Labeled Latent Dirichlet Allocation (LLDA) which…
0
votes
1 answer

sLDA. How much values response variable may have?

I try to understand in general how sLDA works. In contrast to LDA, it has 'a response variable associated with each document'. Is each document labeled just by one topic in training set or it might be labeled by multiple topics? If it must use just…
mariaza
  • 33
  • 3
0
votes
1 answer

Storm and stop words

I am new in storm framework(https://storm.incubator.apache.org/about/integrates.html), I test locally with my code and I think If I remove stop words, it will perform well, but i search on line and I can't see any example that removing stopwords in…
aysudy
  • 53
  • 4
0
votes
0 answers

Topic Modelling using RPy2

I wish to use LDA in Python using RPy. I have already tried this using gensim package but I still wish to try RPy2 out. While using R I use this code: library(RTextTools) library(topicmodels) library(tm) ...Get Data Here and Store to…
Animesh Pandey
  • 5,900
  • 13
  • 64
  • 130
0
votes
2 answers

Work-around to clear blank entries in a document term matrix?

I have some r code that I've used in the past to produce topic models. Everything was working fine until I updated all of my r packages in the hopes of fixing a slightly unrelated problem. Now, code which had previously worked seems to be broken…
beniam
  • 89
  • 1
  • 2
  • 5