Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

  • Latent Dirichlet Allocation (LDA)
  • Hierarchical Dirichlet process (HDP)

Software / Libraries

Related Tags :

980 questions
0
votes
1 answer

R topic modeling - lda command 'lexicalize' giving unexpected results

I am using the 'lda' package in R to perform a topic model analysis of a corpus (let's call it 'corpusB'). I am preparing the corpus for the analysis by first using the command 'lexicalize', which returns a term-document matrix and, if not…
0
votes
2 answers

The output of cvb in mahout 0.7

I'm running Mahout 0.7 on hadoop 1.0.4. I want to see the result of Reuters dataset for the topic modeling task. However, I'm getting kinda useless result when I use the vectordump tools in Mahout. I've read the following set of instructions for…
Yaser Kenesh
  • 81
  • 1
  • 9
0
votes
2 answers

Gensim topic printing errors/issues

All, This is a re-post to what I responded to over in this thread. I am getting some totally screwy results with trying to print LSI topics in gensim. Here is my code: try: from gensim import corpora, models except ImportError as err: print…
aeupinhere
  • 2,883
  • 6
  • 31
  • 39
0
votes
1 answer

About the inference result of Blei's lda-c-dist

I have a question about the inference result of lda-c-dist package. How many words should be displayed when viewing results of inference? For example, if I set number of words to a very large number N(assume number of all terms are N), it seems to…
Peiyun
  • 171
  • 1
  • 2
  • 13
0
votes
1 answer

how to pipe an R LDA topic model into Topic Model Visualization Engine (TMVE)?

What's a good framework for building a topic model and topic browser in Python? documents --> topic model --> topic browser Topic Model Visualization Engine (TMVE) might pipe the results of Latent Dirichlet Allocation and arrange them into…
john mangual
  • 7,718
  • 13
  • 56
  • 95
0
votes
1 answer

Read CSV error in Stanford Topic Modeling Toolbox

I am trying to use the Stanford Topic Modeling Toolbox (TMT) to try out Topic Modeling [0]. I am a Scala beginner. However, I can't seem to prepare my data set by reading a CSV file. Here's my code import scalanlp.io._; val source =…
Dexter
  • 11,311
  • 11
  • 45
  • 61
0
votes
1 answer

Mahout LDA how to predict the topic on test data set?

From the apache Mahout website https://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html I am able to see the procedure to fit an LDA model and output the computed topic in the form of P("word"|"topic number"). However, there is no…
Rkz
  • 1,237
  • 5
  • 16
  • 30
-1
votes
1 answer

Integrate GridSearchCV with LDA Gensim

Data Source: Glassdoor reviews split into two dataframe columns "Pros" & Cons" - Pros refer to what the employees liked about the company - Cons refer to what the employees didn't like about the company I already did all the…
-1
votes
1 answer

Break down text into units of sense - text segmentation NLP Python

I have a dataframe text column (in french) and I want to split each text into sentences by their meaning ( break down text into units of sense ), any idea how to do it with Python libraries and NLP techniques ?! P.S I tried NLTK sent_tokenize and…
Paradisum
  • 11
  • 2
-1
votes
1 answer

How to bypass default parameter to include a range or better SQL?

EDITED (AGAIN): added tables and two screenshots (one of Google Sheets Chart and another showing mutliple issues in DS) to help demonstrate what I am seeing. Short Version: I have created a parameter to help me score trending topics based on the…
-1
votes
1 answer

Is there a way to check which topic a word would be in?

I have used Gensim's LDA topic modeling to create 6 topics. But now I would like to give the model a word and see which topic that would fall under. Is this possible? If so through which method? Ex. Enter word('Fitness') => LDA Model => Percentage…
-1
votes
2 answers

suggest deep learning model for text topic classification

I have a dataset consisting of two columns [Text, topic_labels]. Topic_labels are of 6 categories for ex: [plants,animals,birds,insects etc] I want to build deep learning-based models in order to be able to classify topic_labels. so far I have…
-1
votes
1 answer

Is there a Gensim or any other Python package function to automatically generate a labeling for topic models?

I have a set of topic models generated by Gensim's LDA model. I would like them being automatically labeled so I can pick meaningful labels for each topic more easily. I have come across a function in R language's textmineR package called…
-1
votes
2 answers

How to loop over multiple lists?

I have 13 different lists of words. As I am doing topic modelling, I want to clean them, create corpus, get_document_topics and concatenate the results of all the lists. The code for doing the process over one list i.e. eastern_data_words is shown…
-1
votes
1 answer

Topic Modeling: graphical representation of words with the greatest differences between two topics

In Text Mining with R, methods for unsupervised classification of documents, such as blog posts or news articles, are introduced. This is work for topic modeling. I'm running the codes enclosed in this link, but I do not know how obtain Figure 6.3,…
Mark
  • 1,577
  • 16
  • 43