Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet process (HDP)

Software / Libraries

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans

Related Tags :

topicmodels

980 questions

votes

1 answer

TopicModel: How to query documents by topic model "topic"?

Below I created a full reproducible example to compute the topic model for a given DataFrame. import numpy as np import pandas as pd data = pd.DataFrame({'Body': ['Here goes one example sentence that is generic', 'My car drives…

asked Jul 20 '18 at 18:57

Christopher

2,120
7
31
58

votes

2 answers

Python topic modelling error in mallet

Hi I was using gensim for topic modelling and was using Mallet and was executing this code I unzipped mallet in c drive as shown and also set the environment MALLET_HOME command. The code I was executing is mallet_path =…

python-3.x topic-modeling mallet

asked Jun 07 '18 at 13:05

Anurag

votes

2 answers

How to classify a sentence into one of the pre-defined topic bucket using an unsupervised approach

I am working on a project to classify customer feedback into buckets based on the topic of the feedback comment. So , I need to classify the sentence into one of the topics among a list of pre-defined topics. For example : "I keep getting an error…

python machine-learning nlp gensim topic-modeling

asked May 16 '18 at 08:49

Prajwal

votes

1 answer

How to use DBpedia properties to build a topic hierarchy?

I am trying to build a topic hierarchy by following the below mentioned two DBpedia properties. skos:broader property dcterms:subject property My intention is to given the word identify the topic of it. For example, given the word; 'suport vector…

nlp semantic-web dbpedia topic-modeling spotlight-dbpedia

asked Apr 16 '18 at 00:13

J Cena

votes

1 answer

Classifying new text using LDA in R

I am trying out topic modeling using R for the first time. So, this might be a very dumb question but I am stuck and googling has not given a definitive answer. Given a corpus of documents, I used the LDA function to identify the different topics…

r text-mining lda topic-modeling

asked Jan 15 '18 at 18:14

Abhishek Sourabh

votes

0 answers

Graph only partially displaying in Jupyter Notebook output

I am trying to get a PyLDAvis graph that looks like the 2 shown in this link, that you can see right away (Intertopic Distance Map and Top 30 Most Salient Terms):…

python python-3.x lda topic-modeling graph-visualization

asked Dec 27 '17 at 17:43

bernando_vialli

votes

1 answer

Dynamic topic models/topic over time in R

I have a database of newspaper articles about the water policy from 1998 to 2008. I would like to see how the newspaper release changes during this period. My question is, should I use Dynamic Topic Modeling or Topic Over Time model to handle this…

r text-mining topic-modeling

asked Dec 23 '17 at 13:32

user7453767

votes

1 answer

probability distribution of topics using NMF

I use the following code to do the topic modeling on my documents: from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer tfidf_vectorizer = TfidfVectorizer(tokenizer=tokenize, max_df=0.85, min_df=3, ngram_range=(1,5)) tfidf =…

scikit-learn topic-modeling nmf

asked Oct 10 '17 at 07:14

Monica Muller

votes

1 answer

What's the best way to compare several corpora in natural language?

I've been doing LDA topic models of narrative reports in natural language for a research project (using Gensim with python). I have several smallish corpora (from 1400 to 200 docs each – I know, that's tiny!) that I'd like to compare, but I don't…

python nlp nltk lda topic-modeling

asked Sep 01 '17 at 14:01

Paul Miller

votes

1 answer

how to improve word assignement in different topics in lda

I am working on a language that is the not english and I have scraped the data from different sources. I have done my preprocessing like punctuation removal, stop-words removal and tokenization. Now I want to extract domain specific lexicons. Let's…

python nltk lda topic-modeling

asked Aug 22 '17 at 16:27

user3778289

votes

2 answers

Latent Dirichlet Allocation with prior topic words

Context I'm trying to extract topics from a set of texts using Latent Dirichlet allocation from Scikit-Learn's decomposition module. This works really well, except for the quality of topic words found/selected. In a article by Li et al (2017), the…

python scikit-learn nlp topic-modeling

asked Jul 18 '17 at 14:46

Philip

2,888
2
24
36

votes

2 answers

Mallet topic modeling - topic keys output parameter

In MALLET topic modelling, the --output-topic-keys [FILENAME] option outputs beside each topic a parameter that in the tutorial in the MALLET site called "Dirichlet parameter " of the topic. I want to know what does this parameter represent? is it…

topic-modeling mallet

asked Jul 18 '17 at 09:05

Mahmoud Yusuf

votes

4 answers

about lda inference

Right now, I'm using LDA topic modelling tool from the MALLET package to do some topic detection on my documents. Everything's fine initially, I got 20 topics from it. However, when I try to infer new document using the model, the result is kinda…

nlp topic-modeling mallet

asked Dec 07 '10 at 07:34

goh

27,631
28
89
151

votes

1 answer

Extract topic word probability matrix in gensim LdaModel

I have the LDA model and the document-topic probabilities. # build the model on the corpus ldam = LdaModel(corpus=corpus, num_topics=20, id2word=dictionary) # get the document-topic probabilities theta, _ = ldam.inference(corpus) I also need the…

python gensim lda topic-modeling

asked Feb 17 '17 at 05:09

Clock Slave

7,627
15
68
109

votes

1 answer

Dynamic number of topics in topic models

I am new to topic modelling. My aim is to find key topics from a document. I am planning to use lda for the purpose. But in lda the number of topics should be predefined.I believe if a document from some other domain which was not in the training…

nlp lda gensim topic-modeling

asked Nov 16 '16 at 16:41

Jishad AV

Prev 1 2 3

…

65 66 Next