Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

  • Latent Dirichlet Allocation (LDA)
  • Hierarchical Dirichlet process (HDP)

Software / Libraries

Related Tags :

980 questions
4
votes
1 answer

TopicModel: How to query documents by topic model "topic"?

Below I created a full reproducible example to compute the topic model for a given DataFrame. import numpy as np import pandas as pd data = pd.DataFrame({'Body': ['Here goes one example sentence that is generic', 'My car drives…
Christopher
  • 2,120
  • 7
  • 31
  • 58
4
votes
2 answers

Python topic modelling error in mallet

Hi I was using gensim for topic modelling and was using Mallet and was executing this code I unzipped mallet in c drive as shown and also set the environment MALLET_HOME command. The code I was executing is mallet_path =…
Anurag
  • 41
  • 4
4
votes
2 answers

How to classify a sentence into one of the pre-defined topic bucket using an unsupervised approach

I am working on a project to classify customer feedback into buckets based on the topic of the feedback comment. So , I need to classify the sentence into one of the topics among a list of pre-defined topics. For example : "I keep getting an error…
Prajwal
  • 93
  • 1
  • 9
4
votes
1 answer

How to use DBpedia properties to build a topic hierarchy?

I am trying to build a topic hierarchy by following the below mentioned two DBpedia properties. skos:broader property dcterms:subject property My intention is to given the word identify the topic of it. For example, given the word; 'suport vector…
J Cena
  • 963
  • 2
  • 11
  • 25
4
votes
1 answer

Classifying new text using LDA in R

I am trying out topic modeling using R for the first time. So, this might be a very dumb question but I am stuck and googling has not given a definitive answer. Given a corpus of documents, I used the LDA function to identify the different topics…
4
votes
0 answers

Graph only partially displaying in Jupyter Notebook output

I am trying to get a PyLDAvis graph that looks like the 2 shown in this link, that you can see right away (Intertopic Distance Map and Top 30 Most Salient Terms):…
4
votes
1 answer

Dynamic topic models/topic over time in R

I have a database of newspaper articles about the water policy from 1998 to 2008. I would like to see how the newspaper release changes during this period. My question is, should I use Dynamic Topic Modeling or Topic Over Time model to handle this…
user7453767
  • 339
  • 2
  • 14
4
votes
1 answer

probability distribution of topics using NMF

I use the following code to do the topic modeling on my documents: from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer tfidf_vectorizer = TfidfVectorizer(tokenizer=tokenize, max_df=0.85, min_df=3, ngram_range=(1,5)) tfidf =…
4
votes
1 answer

What's the best way to compare several corpora in natural language?

I've been doing LDA topic models of narrative reports in natural language for a research project (using Gensim with python). I have several smallish corpora (from 1400 to 200 docs each – I know, that's tiny!) that I'd like to compare, but I don't…
Paul Miller
  • 493
  • 1
  • 5
  • 13
4
votes
1 answer

how to improve word assignement in different topics in lda

I am working on a language that is the not english and I have scraped the data from different sources. I have done my preprocessing like punctuation removal, stop-words removal and tokenization. Now I want to extract domain specific lexicons. Let's…
user3778289
  • 323
  • 4
  • 18
4
votes
2 answers

Latent Dirichlet Allocation with prior topic words

Context I'm trying to extract topics from a set of texts using Latent Dirichlet allocation from Scikit-Learn's decomposition module. This works really well, except for the quality of topic words found/selected. In a article by Li et al (2017), the…
Philip
  • 2,888
  • 2
  • 24
  • 36
4
votes
2 answers

Mallet topic modeling - topic keys output parameter

In MALLET topic modelling, the --output-topic-keys [FILENAME] option outputs beside each topic a parameter that in the tutorial in the MALLET site called "Dirichlet parameter " of the topic. I want to know what does this parameter represent? is it…
Mahmoud Yusuf
  • 309
  • 2
  • 13
4
votes
4 answers

about lda inference

Right now, I'm using LDA topic modelling tool from the MALLET package to do some topic detection on my documents. Everything's fine initially, I got 20 topics from it. However, when I try to infer new document using the model, the result is kinda…
goh
  • 27,631
  • 28
  • 89
  • 151
4
votes
1 answer

Extract topic word probability matrix in gensim LdaModel

I have the LDA model and the document-topic probabilities. # build the model on the corpus ldam = LdaModel(corpus=corpus, num_topics=20, id2word=dictionary) # get the document-topic probabilities theta, _ = ldam.inference(corpus) I also need the…
Clock Slave
  • 7,627
  • 15
  • 68
  • 109
4
votes
1 answer

Dynamic number of topics in topic models

I am new to topic modelling. My aim is to find key topics from a document. I am planning to use lda for the purpose. But in lda the number of topics should be predefined.I believe if a document from some other domain which was not in the training…
Jishad AV
  • 79
  • 4