Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions
0
votes
1 answer

how to pipe an R LDA topic model into Topic Model Visualization Engine (TMVE)?

What's a good framework for building a topic model and topic browser in Python? documents --> topic model --> topic browser Topic Model Visualization Engine (TMVE) might pipe the results of Latent Dirichlet Allocation and arrange them into…
john mangual
  • 7,718
  • 13
  • 56
  • 95
0
votes
1 answer

Mahout LDA: what is the largest dictionary size that can practically be used?

I am running Mahout's LDA on EC2 (using Whirr). What is the largest vocabulary that you have been able to use in practice? Could you share some Hadoop/EC2 settings? Ideally, I would like to run LDA on a corpus of 3M documents (1B tokens), with a…
Renaud
  • 16,073
  • 6
  • 81
  • 79
0
votes
1 answer

Mahout LDA how to predict the topic on test data set?

From the apache Mahout website https://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html I am able to see the procedure to fit an LDA model and output the computed topic in the form of P("word"|"topic number"). However, there is no…
Rkz
  • 1,237
  • 5
  • 16
  • 30
0
votes
0 answers

How can I do a classifer in LDA (manual)

I'm trying to make the classification rule of LDA in R , this is using the Euclidean's distance, g(x)= t(w)x - wo, w is my eigenvector, x my test data, wo the mean of the two classes. My question is, how can I pass the model (project data) to model…
-1
votes
1 answer

Integrate GridSearchCV with LDA Gensim

Data Source: Glassdoor reviews split into two dataframe columns "Pros" & Cons" - Pros refer to what the employees liked about the company - Cons refer to what the employees didn't like about the company I already did all the…
-1
votes
1 answer

Bilingual Latent Dirichlet Allocation into [a Modified] K-Means Clustering Algorithm

I have a thesis paper that focuses on using Bilingual LDA and a modified version (modified for runtime) of K-Means for Sentiment Analysis (using Multinomial NB) on Filipino and English COVID-19 Tweets. I have the files that came from my Bi-LDA from…
-1
votes
1 answer

Gensim topic modelling with suggested initial inputs?

I'm doing am LDA topic model on a medium sized corpus using gensim in python. We already know roughly some of the topics we're expecting. In particular, we know that a particular topic definitely exists within the corpus and we want the model to…
-1
votes
1 answer

Carrying out an LDA and predict data

I had the following dataset library(MASS) install.packages("gclus") data(wine) View(wine) install.packages("car") I wanted to split it according to the proportions 70:30 into a training and a test set. Also I wanted to carry out LDA for the…
nils
  • 25
  • 5
-1
votes
1 answer

Is there a way to check which topic a word would be in?

I have used Gensim's LDA topic modeling to create 6 topics. But now I would like to give the model a word and see which topic that would fall under. Is this possible? If so through which method? Ex. Enter word('Fitness') => LDA Model => Percentage…
-1
votes
1 answer

Topic modeling, creating a subplot of trained LDA

Please I to visualize my top 25 topics models using a word cloud. I want the subplot to be placed side by side. I have trained the model. The topics contain the trained LDA model. Below is my code: from gensim.test.utils import common_texts from…
-1
votes
1 answer

List with string code names to numeric codes in python3

I am very new to Python. I have this list called 'prediction' with results from an LDA classification problem. The elements in 'prediction' are string, which I want to convert to numeric values. I am doing it by brute-force like: aux2 =…
Gugumatz
  • 3
  • 1
-1
votes
1 answer

Topic Modeling: graphical representation of words with the greatest differences between two topics

In Text Mining with R, methods for unsupervised classification of documents, such as blog posts or news articles, are introduced. This is work for topic modeling. I'm running the codes enclosed in this link, but I do not know how obtain Figure 6.3,…
Mark
  • 1,577
  • 16
  • 43
-1
votes
2 answers

Error "too many values to unpack" when trying to get similiraties in Gensim using LDA model

I'm using anaconda enviroment python 3.7, gensim 3.8.0, basically. I have my data as a dataframe tha tI separated in a test and training set, they both have this structure: X_test and Xtrain dataframe format : id …
brandata
  • 81
  • 9
-1
votes
2 answers

Why do different runs of the same iteration produce different results?

I've created a dictionary with the document-topic probabilities from a Gensim LDA model. Each iteration over the dictionary (even with the same exact code) produces slightly different values. Why is this? (Note, when the same code is copied and…
Dror M
  • 63
  • 8
-1
votes
1 answer

how can i get the topic coherence score of two models and then use it for comparison?

I want to get the topic coherence for the LDA model. Let's say I have two LDA model one with a bag of words and the second one with a bag of phrases. how I can get the coherence for these two models and then compare them on the basis of coherence?
user3778289
  • 323
  • 4
  • 18