Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet process (HDP)

Software / Libraries

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans

Related Tags :

topicmodels

980 questions

votes

1 answer

Gensim Dictionary Implementation

I was just curious about the gensim dictionary implementation. I have the following code: def build_dictionary(documents): dictionary = corpora.Dictionary(documents) dictionary.save('/tmp/deerwester.dict') # store the dictionary …

asked Aug 12 '13 at 09:38

dmil

votes

1 answer

Representation and a good similarity measure between Tweets for topic detection

I'm planning to write a tool for Topic Detection on Twitter. I've been thinking about a good similarity measure (distance) between two tweets, and how to represent them, taking in count: The #hashtags (I think hashtags are very important when…

twitter machine-learning cluster-analysis information-retrieval topic-modeling

asked Feb 06 '13 at 10:06

Oscar Mederos

29,016
22
84
124

votes

2 answers

Latent Dirichlet Allocation Solution Example

I am trying to learn about Latent Dirichlet Allocation (LDA). I have basic knowledge of machine learning and probability theory and based on this blog post http://goo.gl/ccPvE I was able to develop the intuition behind LDA. However I still haven't…

lda topic-modeling

asked May 16 '12 at 18:48

user737128

votes

1 answer

Cast topic modeling outcome to dataframe

I have used BertTopic with KeyBERT to extract some topics from some docs from bertopic import BERTopic topic_model = BERTopic(nr_topics="auto", verbose=True, n_gram_range=(1, 4), calculate_probabilities=True,…

python-3.x pandas nlp bert-language-model topic-modeling

asked Dec 13 '22 at 12:58

xavi

votes

1 answer

How to extract text from a two-column PDF using PDFPlumber

I am working on topic modeling tasks using python and I would like to extract texts from annual/sustainability reports. However my problem is, when I tried to extract the report, the extracted lines are broken between two different columns in a page…

python text-extraction topic-modeling information-extraction pdfplumber

asked Aug 25 '21 at 08:04

Ramachandran Ravishankar

votes

1 answer

How can I replace emojis with text and treat them as single words?

I have to do a topic modeling based on pieces of texts containing emojis with R. Using the replace_emoji() and replace_emoticon functions let me analyze them, but there is a problem with the results. A red heart emoji is translated as "red heart…

r emoji topic-modeling data-preprocessing

asked May 17 '21 at 19:55

TR_IBK21

votes

1 answer

Genism Module attribute error for wrappers

I am going to find the optimal number of topics for LDA. To do this, I used GENSIM as follows : def compute_coherence_values(dictionary, corpus, texts, limit, start=2, step=3): coherence_values = [] model_list = [] for num_topics in…

python gensim lda topic-modeling

asked Apr 14 '21 at 16:35

Tahereh Maghsoudi

votes

4 answers

Topic modeling on short texts Python

I want to do topic modeling on short texts. I did some research on LDA and found that it doesn't go well with short texts. What methods would be better and do they have Python implementations?

python python-3.x nlp lda topic-modeling

asked Jun 03 '20 at 14:32

Sri Test

votes

1 answer

What is the difference between LDA and NTM in Amazon Sagemaker for Topic Modeling?

I am looking for difference between LDA and NTM . What are some use case where you will use LDA over NTM? As per AWS doc: LDA : The Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm is an unsupervised learning algorithm that attempts to…

algorithm topic-modeling

asked Nov 29 '19 at 19:15

Saurabh

votes

0 answers

Perplexity increases with number of topics

There are quite some posts about this specific issue, but I was unable to solve this problem. I have been experimenting with LDA on the 20newgroup corpus with both the Sklearn and Gensim implementation. It is described in the literature that…

python scikit-learn lda topic-modeling perplexity

asked Jul 01 '19 at 09:44

Bas

votes

2 answers

probabilities returned by gensim's get_document_topics method doesn't add up to one

Sometimes it returns probabilities for all topics and all is fine, but sometimes it returns probabilities for just a few topics and they don't add up to one, it seems it depends on the document. Generally when it returns few topics, the…

text-mining gensim lda topic-modeling

asked Jun 15 '17 at 15:36

nestor556

votes

0 answers

Firebase unsubscribe from topic not work

I subscribe to a topic in fcm with 2 android devices and after I unsubscribe the topic with one device, I still could send messages with the device that I unsubscribe.

firebase firebase-cloud-messaging topic-modeling

asked Apr 10 '17 at 16:33

Moshe Gil

votes

4 answers

pyldavis Unable to view the graph

I am trying to visually depict my topics in python using pyldavis. However i am unable to view the graph. Is it that we have to view the graph in the browser or will it get popped upon execution. Below is my code import pyLDAvis import…

python-3.x lda topic-modeling

asked Apr 10 '17 at 07:21

Deepa Huddar

votes

2 answers

Using Topic Model, how should we set up a "stop words" list?

There are some standard stop lists, giving words like "a the of not" to be removed from corpus. However, I'm wondering, should the stop list change case by case? For example, I have 10K of articles from a journal, then because of the structure of an…

stop-words lda topic-modeling text-classification

asked Feb 24 '15 at 18:09

Ruby

votes

2 answers

How to parallelize topicmodels R package

I have a series of documents (~50,000), that I've transformed into a corpus and have been building LDA objects using the topicmodels package in R. Unfortunately, in order to test more than 150 topics, it takes several hours. So far, I've found that…

r parallel-processing lda topic-modeling

asked Jan 22 '15 at 13:26

Optimus

1,354
1
21
40

Prev 1 2 3

…

65 66 Next