Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions

votes

1 answer

how to pipe an R LDA topic model into Topic Model Visualization Engine (TMVE)?

What's a good framework for building a topic model and topic browser in Python? documents --> topic model --> topic browser Topic Model Visualization Engine (TMVE) might pipe the results of Latent Dirichlet Allocation and arrange them into…

asked Dec 14 '12 at 04:18

john mangual

7,718
13
56
95

votes

1 answer

Mahout LDA: what is the largest dictionary size that can practically be used?

I am running Mahout's LDA on EC2 (using Whirr). What is the largest vocabulary that you have been able to use in practice? Could you share some Hadoop/EC2 settings? Ideally, I would like to run LDA on a corpus of 3M documents (1B tokens), with a…

amazon-ec2 mahout bigdata lda

asked Dec 06 '12 at 02:32

Renaud

16,073
6
81
79

votes

1 answer

Mahout LDA how to predict the topic on test data set?

From the apache Mahout website https://cwiki.apache.org/MAHOUT/latent-dirichlet-allocation.html I am able to see the procedure to fit an LDA model and output the computed topic in the form of P("word"|"topic number"). However, there is no…

mahout lda topic-modeling

asked Sep 21 '12 at 06:05

Rkz

1,237
5
16
30

votes

0 answers

How can I do a classifer in LDA (manual)

I'm trying to make the classification rule of LDA in R , this is using the Euclidean's distance, g(x)= t(w)x - wo, w is my eigenvector, x my test data, wo the mean of the two classes. My question is, how can I pass the model (project data) to model…

r classification lda

asked Sep 05 '12 at 16:04

Jessica Medina

-1

votes

1 answer

Integrate GridSearchCV with LDA Gensim

Data Source: Glassdoor reviews split into two dataframe columns "Pros" & Cons" - Pros refer to what the employees liked about the company - Cons refer to what the employees didn't like about the company I already did all the…

machine-learning lda topic-modeling grid-search gridsearchcv

asked Jun 30 '23 at 17:40

userrr

-1

votes

1 answer

Bilingual Latent Dirichlet Allocation into [a Modified] K-Means Clustering Algorithm

I have a thesis paper that focuses on using Bilingual LDA and a modified version (modified for runtime) of K-Means for Sentiment Analysis (using Multinomial NB) on Filipino and English COVID-19 Tweets. I have the files that came from my Bi-LDA from…

machine-learning data-science k-means lda naivebayes

asked Nov 02 '22 at 04:07

JOHN LOUISE LAGAZO

-1

votes

1 answer

Gensim topic modelling with suggested initial inputs?

I'm doing am LDA topic model on a medium sized corpus using gensim in python. We already know roughly some of the topics we're expecting. In particular, we know that a particular topic definitely exists within the corpus and we want the model to…

python gensim lda

asked Oct 31 '22 at 13:48

Gareth Pearce

-1

votes

1 answer

Carrying out an LDA and predict data

I had the following dataset library(MASS) install.packages("gclus") data(wine) View(wine) install.packages("car") I wanted to split it according to the proportions 70:30 into a training and a test set. Also I wanted to carry out LDA for the…

r predict lda

asked Jun 22 '22 at 07:34

nils

-1

votes

1 answer

Is there a way to check which topic a word would be in?

I have used Gensim's LDA topic modeling to create 6 topics. But now I would like to give the model a word and see which topic that would fall under. Is this possible? If so through which method? Ex. Enter word('Fitness') => LDA Model => Percentage…

python gensim lda topic-modeling

asked Apr 08 '22 at 15:14

Ram Kaashyap

-1

votes

1 answer

Topic modeling, creating a subplot of trained LDA

Please I to visualize my top 25 topics models using a word cloud. I want the subplot to be placed side by side. I have trained the model. The topics contain the trained LDA model. Below is my code: from gensim.test.utils import common_texts from…

python lda word-cloud

asked Feb 28 '22 at 20:49

JOHNPAUL ADIMONYEMMA

-1

votes

1 answer

List with string code names to numeric codes in python3

I am very new to Python. I have this list called 'prediction' with results from an LDA classification problem. The elements in 'prediction' are string, which I want to convert to numeric values. I am doing it by brute-force like: aux2 =…

python lda

asked Feb 16 '22 at 10:07

Gugumatz

-1

votes

1 answer

Topic Modeling: graphical representation of words with the greatest differences between two topics

In Text Mining with R, methods for unsupervised classification of documents, such as blog posts or news articles, are introduced. This is work for topic modeling. I'm running the codes enclosed in this link, but I do not know how obtain Figure 6.3,…

r lda topic-modeling

asked Mar 02 '20 at 21:12

Mark

1,577
16
43

-1

votes

2 answers

Error "too many values to unpack" when trying to get similiraties in Gensim using LDA model

I'm using anaconda enviroment python 3.7, gensim 3.8.0, basically. I have my data as a dataframe tha tI separated in a test and training set, they both have this structure: X_test and Xtrain dataframe format : id …

python gensim similarity recommendation-engine lda

asked Oct 23 '19 at 11:57

brandata

-1

votes

2 answers

Why do different runs of the same iteration produce different results?

I've created a dictionary with the document-topic probabilities from a Gensim LDA model. Each iteration over the dictionary (even with the same exact code) produces slightly different values. Why is this? (Note, when the same code is copied and…

python pandas loops gensim lda

asked Sep 29 '19 at 07:57

Dror M

-1

votes

1 answer

how can i get the topic coherence score of two models and then use it for comparison?

I want to get the topic coherence for the LDA model. Let's say I have two LDA model one with a bag of words and the second one with a bag of phrases. how I can get the coherence for these two models and then compare them on the basis of coherence?

lda topic-modeling

asked Apr 15 '19 at 19:37

user3778289

Prev 1 2 3

…

78 79 Next