Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet process (HDP)

Software / Libraries

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans

Related Tags :

topicmodels

980 questions

votes

3 answers

Best model for topic spotting/discovery

What is the best model for topic spotting within short unstructured documents, ex. SMS or Twitter messages? Latent Dirichlet allocation?

nlp keyword information-retrieval information-extraction topic-modeling

asked Oct 06 '11 at 17:12

user152949

votes

1 answer

Top2Vec reassign topics to original df

I have trained a topic model using Top2Vec as follows: import pandas as pd from top2vec import Top2Vec df = data = [['1', 'Beautiful hotel, really enjoyed my stay'], ['2', 'We had a terrible experience. Will not return.'], ['3', 'Lovely hotel. The…

python topic-modeling

asked Oct 11 '22 at 10:29

peanutbutterandjellyfish

votes

1 answer

Yahoo! LDA Implementation Questions

All, I have been running Y!LDA (https://github.com/shravanmn/Yahoo_LDA) on a set of documents and the results look great (or at least what I would expect). Now I want to use the resulting topics to perform a reverse query against the corpus. Does…

yahoo lda topic-modeling

asked Sep 11 '11 at 21:09

aeupinhere

2,883
6
31
39

votes

1 answer

How to get topic-probs matrix in bertopic modeling

I ran BERTopic to get topics for 3,500 documents. How could I get the topic-probs matrix for each document and export them to csv? When I export them, I want to export the identifier of each document too. I tried two approaches: First, I found…

python nlp bert-language-model topic-modeling

asked Sep 19 '22 at 05:01

JJD

votes

0 answers

NLP - Extract main actions/tasks from unstructured sentences

I have a lot of unstructured data that conveys a set of certain actions. For example: Sentence 1: build and paint chain link fence with black coating, post and rail to be red coat Sentence 2: new roll door and temp slide door - additional ac…

python nlp data-science topic-modeling

asked Aug 08 '22 at 15:11

Animeartist

1,047
1
10
21

votes

1 answer

Recreating the pyLDAvis chart in Altair - filtered data with empty selection

I am trying to recreating the classic pyLDAvis visualization for topic modelling in Altair. I've hit a snag when it comes to filtering. In the pyLDAvis chart, an empty selection in the scatter chart shows the so-called "Default" topic in the right…

python topic-modeling altair vega-lite pyldavis

asked Jun 11 '21 at 00:12

campo

votes

0 answers

choose the best Coherence Score for LDA model

I am using python Gensim package to build LDA model…

python gensim lda topic-modeling

asked Oct 16 '20 at 13:38

almegdadi

votes

1 answer

How to measure how distinct a document is based on predefined linguistic categories?

I have 3 categories of words that correspond to different types of psychological drives (need-for-power, need-for-achievement, and need-for-affiliation). Currently, for every document in my sample (n=100,000), I am using a tool to count the number…

nlp data-science topic-modeling cosine-similarity word-embedding

asked May 27 '20 at 08:07

SanMelkote

votes

3 answers

How to fix LDA model coherence score runtime Error?

text='Alice is a student.She likes studying.Teachers are giving a lot of homewok.' I am trying to get topics from a simple text(like above) with coherance score.This is my LDA model: id2word = corpora.Dictionary(data_lemmatized) texts =…

python nlp runtime-error lda topic-modeling

asked May 17 '20 at 13:50

Xena

votes

2 answers

Each row of the input matrix needs to contain at least one non-zero entry

I have this issue when I run this chunk of code text_lda <- LDA(text_dtm, k = 2, method = "VEM", control = NULL) I have the next mistake "Each row of the input matrix needs to contain at least one non-zero entry" Then I tried to solve this with…

r memory lda topic-modeling

asked Jan 17 '20 at 22:04

coding

votes

1 answer

PyLDAvis visualisation does not align with generated topics

I am using PyLDAvis to visualise the results of the LDA from Mallet. Before I can do that, I need the wrapper of the gensim library: model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(model_list[8]) When I print the found topics, they…

python gensim lda topic-modeling mallet

asked Dec 13 '19 at 12:14

gython

votes

1 answer

Structural Topic Modeling in R: Plot statistical significance for Topic Content

my question relates to structural topic modeling in R, specifically to the stm package developed by Roberts et al. (https://cran.r-project.org/web/packages/stm/vignettes/stmVignette.pdf). I implemented a structural topic model in order to…

r nlp topic-modeling topicmodels

asked Oct 20 '19 at 11:17

RAnnR

votes

1 answer

Structural Topic Modeling (stm) Error in makeTopMatrix(prevalence, data) : Error creating model matrix

I'm trying to run the initial steps of this stm tutorial https://github.com/dondealban/learning-stm with this dataset, it is part of the original…

r topic-modeling

asked Mar 04 '19 at 12:15

Ana

votes

0 answers

What is a "good" value for LSI topic coherence?

I'm using the gensim python library to work on small corpora (around 1500 press articles each time). Let say I'm interested in creating clusters of articles relating the same news. So for each corpus of articles I've tokenized, detected…

python gensim topic-modeling latent-semantic-indexing

asked Jan 28 '19 at 05:18

fbparis

votes

0 answers

Assign more weight to certain documents within the corpus - LDA - Gensim

I am using LDA for topic modelling but unfortunately my data is heavily skewed. I have documents from 10 different categories and would like each category to equally contribute to the LDA topics. However, each category has a varying number of…

python-3.x nlp gensim lda topic-modeling

asked Dec 19 '18 at 13:15

Mia

Prev 1 2 3

…

65 66 Next