Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

  • Latent Dirichlet Allocation (LDA)
  • Hierarchical Dirichlet process (HDP)

Software / Libraries

Related Tags :

980 questions
3
votes
3 answers

Best model for topic spotting/discovery

What is the best model for topic spotting within short unstructured documents, ex. SMS or Twitter messages? Latent Dirichlet allocation?
3
votes
1 answer

Top2Vec reassign topics to original df

I have trained a topic model using Top2Vec as follows: import pandas as pd from top2vec import Top2Vec df = data = [['1', 'Beautiful hotel, really enjoyed my stay'], ['2', 'We had a terrible experience. Will not return.'], ['3', 'Lovely hotel. The…
3
votes
1 answer

Yahoo! LDA Implementation Questions

All, I have been running Y!LDA (https://github.com/shravanmn/Yahoo_LDA) on a set of documents and the results look great (or at least what I would expect). Now I want to use the resulting topics to perform a reverse query against the corpus. Does…
aeupinhere
  • 2,883
  • 6
  • 31
  • 39
3
votes
1 answer

How to get topic-probs matrix in bertopic modeling

I ran BERTopic to get topics for 3,500 documents. How could I get the topic-probs matrix for each document and export them to csv? When I export them, I want to export the identifier of each document too. I tried two approaches: First, I found…
JJD
  • 31
  • 4
3
votes
0 answers

NLP - Extract main actions/tasks from unstructured sentences

I have a lot of unstructured data that conveys a set of certain actions. For example: Sentence 1: build and paint chain link fence with black coating, post and rail to be red coat Sentence 2: new roll door and temp slide door - additional ac…
Animeartist
  • 1,047
  • 1
  • 10
  • 21
3
votes
1 answer

Recreating the pyLDAvis chart in Altair - filtered data with empty selection

I am trying to recreating the classic pyLDAvis visualization for topic modelling in Altair. I've hit a snag when it comes to filtering. In the pyLDAvis chart, an empty selection in the scatter chart shows the so-called "Default" topic in the right…
campo
  • 624
  • 5
  • 15
3
votes
0 answers

choose the best Coherence Score for LDA model

I am using python Gensim package to build LDA model…
almegdadi
  • 79
  • 13
3
votes
1 answer

How to measure how distinct a document is based on predefined linguistic categories?

I have 3 categories of words that correspond to different types of psychological drives (need-for-power, need-for-achievement, and need-for-affiliation). Currently, for every document in my sample (n=100,000), I am using a tool to count the number…
3
votes
3 answers

How to fix LDA model coherence score runtime Error?

text='Alice is a student.She likes studying.Teachers are giving a lot of homewok.' I am trying to get topics from a simple text(like above) with coherance score.This is my LDA model: id2word = corpora.Dictionary(data_lemmatized) texts =…
Xena
  • 35
  • 5
3
votes
2 answers

Each row of the input matrix needs to contain at least one non-zero entry

I have this issue when I run this chunk of code text_lda <- LDA(text_dtm, k = 2, method = "VEM", control = NULL) I have the next mistake "Each row of the input matrix needs to contain at least one non-zero entry" Then I tried to solve this with…
coding
  • 917
  • 2
  • 12
  • 25
3
votes
1 answer

PyLDAvis visualisation does not align with generated topics

I am using PyLDAvis to visualise the results of the LDA from Mallet. Before I can do that, I need the wrapper of the gensim library: model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(model_list[8]) When I print the found topics, they…
gython
  • 865
  • 4
  • 18
3
votes
1 answer

Structural Topic Modeling in R: Plot statistical significance for Topic Content

my question relates to structural topic modeling in R, specifically to the stm package developed by Roberts et al. (https://cran.r-project.org/web/packages/stm/vignettes/stmVignette.pdf). I implemented a structural topic model in order to…
RAnnR
  • 31
  • 1
3
votes
1 answer

Structural Topic Modeling (stm) Error in makeTopMatrix(prevalence, data) : Error creating model matrix

I'm trying to run the initial steps of this stm tutorial https://github.com/dondealban/learning-stm with this dataset, it is part of the original…
Ana
  • 149
  • 3
  • 12
3
votes
0 answers

What is a "good" value for LSI topic coherence?

I'm using the gensim python library to work on small corpora (around 1500 press articles each time). Let say I'm interested in creating clusters of articles relating the same news. So for each corpus of articles I've tokenized, detected…
fbparis
  • 880
  • 1
  • 10
  • 23
3
votes
0 answers

Assign more weight to certain documents within the corpus - LDA - Gensim

I am using LDA for topic modelling but unfortunately my data is heavily skewed. I have documents from 10 different categories and would like each category to equally contribute to the LDA topics. However, each category has a varying number of…
Mia
  • 559
  • 4
  • 9
  • 21