Questions tagged [latent-semantic-analysis]

Latent semantic analysis is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. Use this tag for questions related to the natural language processing technique.

Latent semantic analysis is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. Use this tag for questions related to the natural language processing technique.

30 questions
1
vote
1 answer

How to generate recommendation with matrix factorization

I've read some papers of Matrix Factorization(Latent Factor Model) in Recommendation System,and I can implement the algorithm.I can get the similar RMSE result like the paper said on the MovieLens dataset. However I find out that,if I try to…
1
vote
1 answer

probabilistic latent semantic analysis R

Is there a package that supports probabilistic latent semantic analysis for R? I found the LSA package, but is there one that specifically performs pLSA? Thanks.
TomR
  • 546
  • 8
  • 19
0
votes
0 answers

Tensor Decomposition and Label-Weight Assignment in Python

I have a tensor with dimensions 4149x1000, representing 4149 images, each characterized by 1000 features. Additionally, there are 101 labels, and while there are 4149 images, these labels are not one-to-one mapped to images. Instead, there is…
0
votes
1 answer

Extracting word features from BERT model

So as you know, we can extract BERT features of word in a sentence. My question is, can we also extract word features that are not included in a sentence? For example, bert features of single words such as "dog", "human", etc.
Kadaj13
  • 1,423
  • 3
  • 17
  • 41
0
votes
0 answers

nltk latent semantic analysis copies the first topics over and over

This is my first attempt with Natural Language Processing so I started with Latent Semantic Analysis and used this tutorial to build the algorithm. After testing it I see that it only classifies the first semantic words and repeats the same terms…
0
votes
0 answers

Unsupervised commands classification

How can I cluster commands such as /bin/busybox chmod 777 /dvrHelper without using Bag-Of-Words representation? Models like LDA or Word2vec could be useful for my goal?
0
votes
1 answer

Is it possible to set the initial topic assignments for scikit-learn LDA?

Instead of setting the topic_word_prior as a parameter, I would like to initialize the topics according to a pre-defined distribution over words. How would I set this initial topic distribution in sklearn's implementation? If it's not possible, is…
0
votes
1 answer

Which formula of tf-idf does the LSA model of gensim use?

There are many different ways in which tf and idf can be calculated. I want to know which formula is used by gensim in its LSA model. I have been going through its source code lsimodel.py, but it is not obvious to me where the document-term matrix…
0
votes
1 answer

Latent Semantic Indexation with gensim

In order to use the Latent semantic indexation method from gensim, I want to begin with a small "classique" example like : import logging, gensim, bz2 id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt') mm =…
0
votes
1 answer

Latent Semantic Analysis and Stemming

Assume a very large corpus of any inflective language. Does the following make sense? By applying LSA on such corpus, words with similar concepts converge together in vector space, thus inflected word forms reffering to the same concept should…
L D
  • 593
  • 1
  • 3
  • 16
0
votes
1 answer

Finding Semantic Coherence between sentences in a text

I need some help writing a program based on the code from these links link1 and link2 that will automatically calculate the semantic similarity between a. Consecutive sentences and b. Sentences seperates by 1 intervening phrase, in and entire…
0
votes
1 answer

choose the proper clustering method for Latent Semantic Analysis

i want to cluster some text document to find the document with the same concept. i've done the semantic similarity using Latent Semantic Analysis (LSA), but i confuse which clustering method that i should choose for my purpose . Thank you
0
votes
1 answer

LSA Similarity interface

I am a PhD student in translation studies and I am currently working on my dissertation. I am using LSA Similarity interface as a method of analysis in my dissertation. My background is in linguistics and not computer science. I tried to find an…
0
votes
1 answer

How to do Latent Semantic Analysis on a very large dataset

I am trying to run LSA or Principal component analysis on a very large dataset, about 50,000 documents and over 300,000 words/terms, to reduce the dimensionality so I can graph the documents in 2-d. I have tried in Python and in MATLAB but my…
0
votes
1 answer

Is there a memory implementation of the SparseVectorsFromSequenceFiles, RowIdJob and RowSimilarityJob jobs

I've been working on performing Latent Semantic Analysis using the SparseVectorsFromSequenceFiles, RowIdJob and RowSimilarityJob Hadoop jobs provided by Mahout, which run Map/Reduce jobs. I've been trying to find an equivalent implementation for…
1
2