Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet process (HDP)

Software / Libraries

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans

Related Tags :

topicmodels

980 questions

votes

1 answer

Using LDA(topic model) : the distrubution of each topic over words are similar and "flat"

Latent Dirichlet Allocation(LDA) is a topic model to find latent variable (topics) underlying a bunch of documents. I'm using python gensim package and having two problems: I printed out the most frequent words for each topic (I tried 10,20,50…

asked Feb 23 '15 at 15:34

Ruby

votes

1 answer

How can I speed up a topic model in R?

Background I am trying to fit a topic model with the following data and specification documents=140 000, words = 3000, and topics = 15. I am using the package topicmodels in R (3.1.2) on a Windows 7 machine (ram 24 GB, 8 cores). My problem is that…

r machine-learning lda topic-modeling unsupervised-learning

asked Jan 26 '15 at 16:52

Adel

votes

5 answers

Mallet topic model example can not compile

I want to compile mallet in my Java (instead using the command line), so I include the jar in my project, and cite the code of the example from: http://mallet.cs.umass.edu/topics-devel.php, however, when I run this code, there is error that…

topic-modeling mallet

asked Aug 18 '14 at 05:31

flyingmouse

1,014
3
13
29

votes

3 answers

Text Clustering and topic extraction

I'm doing some text mining using the excellent scikit-learn module. I'm trying to cluster and classify scientific abstracts. I'm looking for a way to cluster my set of tf-id representations, without having to specify the number of clusters in…

python-2.7 scikit-learn text-mining topic-modeling

asked May 30 '13 at 08:39

Misconstruction

1,839
4
17
23

votes

2 answers

Topic modelling, but with known topics?

Okay, so usually topic models (such as LDA, pLSI, etc.) are used to infer topics that may be present in a set of documents, in an unsupervised fashion. I would like to know if anyone has any ideas as to how I can shoehorn my problem into an LDA…

topic-modeling

asked May 28 '13 at 00:15

user1871183

votes

1 answer

GSDMM Convergence of Clusters (Short Text Clustering)

I am using this GSDMM python implementation to cluster a dataset of text messages. GSDMM converges fast (around 5 iterations) according the inital paper. I also have a convergence to a certain number of clusters, but there are still a lot of…

python cluster-analysis topic-modeling convergence

asked Jun 04 '20 at 09:18

simon

votes

1 answer

ValueError: Stop argument for islice() must be None or an integer: 0 <= x <= sys.maxsize on topic coherence

im following this tutorials https://towardsdatascience.com/evaluate-topic-model-in-python-latent-dirichlet-allocation-lda-7d57484bb5d0 and find problem. so my purpose on this code to make iterate it over the range of topics, alpha, and beta…

python python-3.x long-integer python-itertools topic-modeling

asked Feb 06 '20 at 03:50

adityabrillian

votes

1 answer

Topic Modeling in Mallet; Documentation

I'm looking for some good documentation for Mallet, specifically for its classes related to topic modeling. I've looked at the Java docs but they aren't too helpful. For example: estimate public void estimate() throws…

java mallet topic-modeling

asked Feb 25 '11 at 17:59

akobre01

votes

2 answers

Negative Values: Evaluate Gensim LDA with Topic Coherence

I´m currently trying to evaluate my topic models with gensim topiccoherencemodel: from gensim.models.coherencemodel import CoherenceModel cm_u_mass = CoherenceModel(model = model1, corpus = corpus1, coherence = 'u_mass') coherence_u_mass =…

python-3.x gensim evaluation topic-modeling

asked May 30 '18 at 14:34

Nils_Denter

votes

1 answer

Stem completion in R replaces names, not data

My team is doing some topic modeling on medium-sized chunks of text (tens of thousands of words), using the Quanteda package in R. I'd like to reduce words to word stems before the topic modeling process, so that I'm not counting variations on the…

r tm topic-modeling quanteda

asked Apr 04 '18 at 22:26

J. Trimarco

votes

1 answer

How to interpret Sklearn LDA perplexity score. Why it always increase as number of topics increase?

I try to find the optimal number of topics using LDA model of sklearn. To do this I calculate perplexity by referring code on https://gist.github.com/tmylk/b71bf7d3ec2f203bfce2. But when I increase the number of topics, perplexity always increase …

python scikit-learn topic-modeling perplexity

asked Aug 13 '17 at 07:08

JonghoKim

1,965
7
21
44

votes

1 answer

Automatic labeling of LDA generated topics

I'm trying to categorize customer feedback and I ran an LDA in python and got the following output for 10 topics: (0, u'0.559*"delivery" + 0.124*"area" + 0.018*"mile" + 0.016*"option" + 0.012*"partner" + 0.011*"traffic" + 0.011*"hub" +…

python nlp lda topic-modeling labeling

asked May 15 '17 at 17:41

Arman

votes

1 answer

error Installing topicmodels in R Ubuntu

I am getting error while installing topicmodels package in R. on running install.packages("topicmodels",dependencies=TRUE) following are the last few lines I am getting. Please help. My R version is 3.1.3. g++ -I/usr/share/R/include -DNDEBUG …

r ubuntu-14.04 topic-modeling

asked Mar 13 '15 at 05:21

Mohit Mangal

votes

3 answers

Are there any efficient python libraries for Dynamic Topic Models, preferably extending Gensim?

I'm trying to model twitter stream data with topic models. Gensim, being an easy to use solution, is impressive in it's simplicity. It has a truly online implementation for LSI, but not for LDA. For a changing content stream like twitter, Dynamic…

python lda text-analysis topic-modeling gensim

asked Mar 18 '14 at 02:52

Ravi Karan

votes

1 answer

hierarchical classification + topic model training data for internet articles and social media

I want to classify large numbers (100K to 1M+) of smallish internet-based articles (tweets, blog articles, news, etc) by topic. Toward this goal, I have been looking for labeled training data documents which I could use to build classifier…

nltk scikit-learn hierarchical-clustering topic-modeling training-data

asked Nov 05 '13 at 21:40

Ziggy Eunicien

2,858
1
23
28

Prev 1 2 3

…

65 66 Next