Questions tagged [lda]

Latent Dirichlet Allocation, LDA, is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

If observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA represents documents as mixtures of topics that spit out words with certain probabilities.

It should not be confused with Linear Discriminant Analysis, a supervised learning procedure for classifying observations into a set of categories.

1175 questions
-1
votes
1 answer

Combine Word Embeddings with with topic-word distribution from LDA for text summarization

Im a newbie in NLP and i was wondering if it is a good idea to summarize a document that has already been classified into a certain topic through methods such as LDA by considering the Word Embedding retrieved from Word2Vec and the topic-word…
-1
votes
1 answer

LDA: Assign more than one topic to a document

I´m new to LDA and doing some experiments with Python + LDA and some sample datasets. I already got some very interesting results and now I asked myself a question but couldn´t find an answer so far. Since I worked with customer reviews/ratings of a…
Nicson
  • 15
  • 4
-1
votes
3 answers

Return None in function: TypeError: object of type 'NoneType' has no len()

I am trying to print my topics and texts from each topic in LDA. But a None after printing the topics is disrupting my script. I can print my topics but not the texts. import pandas import numpy as np from sklearn.feature_extraction.text import…
marin
  • 923
  • 2
  • 18
  • 26
-1
votes
1 answer

Python - IndexError: list index out of range (topic modeling)

I've come across a lot of similar questions. However, the answers provided seemed not to be helpful to me. I'm trying to run a Topic Modeling analysis on an 8000'ish media articles. But I'm getting this error: Traceback (most recent call last): …
-1
votes
1 answer

Classification LDA vs. TFIDF

I was running Multi-label classification on text data I noticed TFIDF outperformed LDA by a large margin. TFIDF accuracy was aorund 50% and LDA was around 29%. Is this expected or should LDA do better than this?
MikeAlbert
  • 164
  • 3
  • 11
-1
votes
2 answers

LDA python library not taking sparse matrix as input

I am trying to use the lda 1.0.2 package for python. The documentation says that sparse matrix are acceptable, but when I pass a sparse matrix to the transform() function. It throws the error The truth value of an array with more than one element…
-1
votes
1 answer

How can I perform LDA (latent Dirichlet allocation) on Noun Phrases in R instead of words?

I want to generate topics from my text at the level of phrases, rather than at the level of words using LDA (latent Dirichlet allocation). How can I do that in R? LDA interprets the documents as bag-of-words and produces topics with constituting…
carora3
  • 466
  • 1
  • 5
  • 19
-1
votes
1 answer

How to plot log.likelihoods for each iteration in R using LDA package?

My problem is that I want to plot the log.likelihoods gathered from LDA execution in R using the LDA package. My code is: K <- 10 ## Num clusters result <- lda.collapsed.gibbs.sampler(cora.documents, K, ## Num…
-2
votes
1 answer

How to remove error too many values to unpack (expected 2)

Applied LDA model usinf TFIDF and then I want Performance evaluation by classifying sample document using LDA TF-IDF model. Code: for index, score in sorted(lda_model_tfidf[corpus], key=lambda tup: -1*tup[1]): print("\nScore: {}\t \nTopic:…
-2
votes
1 answer

Calculating LDA in matlab

I have written the following code: %LDA file = xlsread('LDA.xlsx'); Graph=[]; for c=1:840 for i=1:17 for j=18:34 Graph=[Graph,file(i,c),file(j,c)]; end end end lda=resubLoss(Graph) but the func resubLoss does…
Yasmin
  • 13
  • 6
-2
votes
2 answers

Roc curve in linear discriminant analysis with R

I want to compute the Roc curve and then the AUC from the linear discriminant model. Do you know how can I do this? here there is the code: ##LDA require(MASS) library(MASS) lda.fit = lda(Negative ~., trainSparse) lda.fit plot(lda.fit) ###prediction…
mac gionny
  • 333
  • 1
  • 3
  • 8
-2
votes
1 answer

Different dimensions of distributions of topics

I would like to divide all documents in 10 topics, and it goes well with a converged result except for the dimensions of distributions and covariance matrix of topic. Why the topics distribution is a 9 dimension vector instead of 10 and their…
Jeffy
  • 121
  • 1
  • 2
  • 10
-2
votes
2 answers

IndentationError: expected an indented block when trying to reproduce LDA for a document

I am trying to obtain the LDA distribution among the first article of my collection but I am running into several errors: my collection: doc_set, is a pandas.core.series.Series. Whenever I wanted to run the simple…
Economist_Ayahuasca
  • 1,648
  • 24
  • 33
-2
votes
2 answers

bag-of-words approach / tools / library for C++?

I have a folder that contains many document in .txt of tourism reviews. I want to use the bag of words approach to convert them to some kind of numeric representation for machine learning (Latent Dirichlet Allocation - LDA) in c++ to train the…
-2
votes
2 answers

Non-GPL Open Source Latent Dirichlet Allocation Implementation/Library in C/C++

I know some implementations (mainly from this question) but they seemed to be all published unter GPL. Are there any (platform independent) implementations without the GPL restrictions?
snøreven
  • 1,904
  • 2
  • 19
  • 39
1 2 3
78
79