Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

  • Latent Dirichlet Allocation (LDA)
  • Hierarchical Dirichlet process (HDP)

Software / Libraries

Related Tags :

980 questions
3
votes
3 answers

print bigrams learned with gensim

I want to learn bigrams from a corpus using gensim, and then just print the bigrams learned. i've not seen an example that does this. help appreciated from gensim.models import Phrases documents = ["the mayor of new york was there", "human computer…
Aviad Rozenhek
  • 2,259
  • 3
  • 21
  • 42
3
votes
1 answer

Adding topic distribution (outcome of Topic Model) to pandas dataframe

I calculated a topic model, so far so good. First of all my dataframe looks like this: identifier comment_cleaned 1 some cleaned comment 2 another cleaned comment 8 ... ... Then I calcuated my…
cian
  • 191
  • 2
  • 11
3
votes
0 answers

NMF yields all-zero weights

Periodically, when I run topic analyses on data and try to visualize using pyLDAvis, I get a validation error: "Not all rows (distributions) in doc_topic_dists sum to 1." Here's some basic code. Some code below: tfidf_vectorizer =…
sw85
  • 53
  • 4
3
votes
1 answer

IndexError when trying to update gensim's LdaModel

I am facing the following error when trying to update my gensim's LdaModel: IndexError: index 6614 is out of bounds for axis 1 with size 6614 I checked why were other people having this issue on this thread, but I am using the same dictionary…
V. Déhaye
  • 493
  • 6
  • 20
3
votes
1 answer

How to use gensim's LDA to conduct text-retrievals from queries?

I am trying to understand how LDA can be used for text-retrieval, and I am currently using the gensim's LdaModel model for implementing LDA, here: https://radimrehurek.com/gensim/models/ldamodel.html. I have managed to identify the k topics and…
helpme
  • 33
  • 2
3
votes
2 answers

removing special apostrophes from French article contractions when tokenizing

I am currently running an stm (structural topic model) of a series of articles from the french newspaper Le Monde. The model is working just great, but I have a problem with the pre-processing of the text. I'm currently using the quanteda package…
kouta
  • 55
  • 6
3
votes
1 answer

How to generate term matrix in guided LDA for topic modeling?

I am currently working on analyzing online reviews. I would like to try GuidedLDA (https://medium.freecodecamp.org/how-we-changed-unsupervised-lda-to-semi-supervised-guidedlda-e36a95f3a164) as some of the topics overlap. I have successfully…
3
votes
2 answers

Mallet Topic Modelling API - How to decide number of intervals needed or best for optimization?

Sorry I'm quite the beginner in the field of NLP, as the title says what is the best interval for optimization in Mallet API? I was also wondering if it was dependent or related to the number of iterations/topics/corpus etc.
3
votes
1 answer

Error in Mallet Java

I want to do topic modelling , So, I ran the below command :- bin\mallet train-topics --input web.mallet --output-state output-file.gz It tells me :- Topic modeling currently only supports feature sequences: use --keep-sequence option when…
shahrukh
  • 73
  • 5
3
votes
0 answers

User Review - Topic modeling or Intent detection in R

I am doing social media analysis in R - something like, reviewing user feedback on a particular business and trying to distinguish a user review to a category/topic(s). For example: Find if the user review talks about Neighborhood or Crime etc..…
Karna Bhua
  • 31
  • 2
3
votes
0 answers

Hierarchical Dirichlet Process in PyMC3

I'm trying to implement Hierarchical Dirichlet Process (HDP) topic model using PyMC3. The HDP graphical model is shown below: I came up with the following code: import numpy as np import scipy as sp import pandas as pd import seaborn as sns import…
Vadim Smolyakov
  • 1,187
  • 11
  • 24
3
votes
1 answer

What is the probability of a TERM for a specific TOPIC in Latent Dirichlet Allocation (LDA) in R

I'm working in R, package "topicmodels". I'm trying to work out and better understand the code/package. In most of the tutorials, documentation I'm reading I'm seeing people define topics by the 5 or 10 most probable terms. Here is an example: …
3
votes
0 answers

How to create topic names using LDA topic model

I am working on the LDA topic model in python which gives output of of the following topics: (0, u'0.559*"delivery" + 0.124*"area" + 0.018*"mile" + 0.016*"option" + 0.012*"partner" + 0.011*"traffic" + 0.011*"hub" + 0.011*"thanks" + 0.010*"city" +…
Arman
  • 827
  • 3
  • 14
  • 28
3
votes
1 answer

Are there any R packages or published code on topic models that account for time?

I am trying to perform topic modeling on a data set of political speeches that spans 2 centuries, and would ideally like to use a topic model that accounts for time, such as Topics over Time (McCallum and Wang 2006) or the Dynamic Topic model (Blei…
aeiz
  • 31
  • 3
3
votes
0 answers

interpretation of SVD for text mining topic analysis

Background I'm learning about text mining by building my own text mining toolkit from scratch - the best way to learn! SVD The Singular Value Decomposition is often cited as a good way to: Visualise high dimensional data (word-document matrix) in…