Questions tagged [lsa]

LSA stands for Latent Semantic Analysis, a natural language processing technique which involves analysing the relationships between documents and terms they contain by producing a set of related concepts.

LSA stands for Latent Semantic Analysis, a natural language processing technique which involves analysing the relationships between documents and terms they contain by producing a set of related concepts.

For the Microsoft Windows subsystem, see (local-security-authority).

126 questions
3
votes
1 answer

How to handle negative values of cosine similarities

I computed tf-idf of my documents based of terms. Then, I applied LSA to reduce the dimensionality of the terms. 'similarity_dist' contains values which are negative (see table below). How can I compute cosine distance with the range…
kitchenprinzessin
  • 1,023
  • 3
  • 14
  • 30
3
votes
1 answer

How to compute word similarity using TF-IDF or LSA with gensim?

I know that word2vec in gensim can compute similarity between words. But now I want to compute word similarity using TF-IDF or LSA with gensim. How to do it? note: Computing document similarity using LSA with gensim is easy:…
hankaixyz
  • 96
  • 1
  • 5
3
votes
1 answer

Calling AuditQuerySystemPolicy() (advapi32.dll) from C# returns "The parameter is incorrect"

The sequence is like follows: Open a policy handle with LsaOpenPolicy() (not shown) Call LsaQueryInformationPolicy() to get the number of categories; For each category: Call AuditLookupCategoryGuidFromCategoryId() to turn the enum value into a…
JCCyC
  • 16,140
  • 11
  • 48
  • 75
3
votes
2 answers

Singular Value Decomposition: Different results with Jama, PColt and NumPy

I want to perform Singular Value Decomposition on a large (sparse) matrix. In order to choose the best(most accurate) library, I tried replicating the SVD example provided here using different Java and Python libraries. Strangely I am getting…
user2588219
  • 31
  • 1
  • 2
3
votes
1 answer

Applying a function between specific pairs of columns in a matrix in R

I am generating a matrix using the lsa package in R. After the matrix is created, I would like to calculate the cosine similarity between specific pairs of documents (columns) in the matrix. Currently, I am doing this with nested for-loops, and it…
E. Moritz
  • 51
  • 1
  • 6
3
votes
1 answer

pLSA implementation for sparse matrix

I'm trying to implement the pLSA algorithm proposed by Thomas Hoffman (1999). However, all the implementations I have found consider the input term-doc matrix as complete instead of sparse. Since my input matrix is quite large and sparse, I would…
Jia
  • 1,301
  • 1
  • 12
  • 18
3
votes
1 answer

Latent Semantic Analysis/Indexing Library for C++

Is there a C++ library for LSA/LSI? Preferably MIT, BSD, Apache,... license - no GPL.
snøreven
  • 1,904
  • 2
  • 19
  • 39
2
votes
0 answers

hklm\Security Vs Security\Policy

I am researching the way an attacker would get a machine credentials. I figured the most common methods are to dump hklm\sam hklm\security hklm\system I was able to figure what information is stored in the SAM and why would I want to save it,…
Knightwish
  • 51
  • 1
  • 4
2
votes
0 answers

LSA and K means in document clustering, results are not printing correctly

I have recently done some document clustering using LSA then Kmeans. However when I try to print the most important words in each cluster im getting very strange results, it printing words that dont even below to that cluster. below is the code and…
Brian Ly
  • 21
  • 1
2
votes
1 answer

How to get the vector representation of a word using a trained SVD model

I have trained (fit and transform) a SVD model using 400 documents as part of my effort to build a LSA model. Here is my code: tfidf_vectorizer = sklearn.feature_extraction.text.TfidfVectorizer(stop_words='english', use_idf=True,…
Pedram
  • 2,421
  • 4
  • 31
  • 49
2
votes
1 answer

Adding documents to gensim model

I have a class wrapping the various objects required for calculating LSI similarity: class SimilarityFiles: def __init__(self, file_name, tokenized_corpus, stoplist=None): if stoplist is None: self.filtered_corpus =…
faerubin
  • 177
  • 12
2
votes
2 answers

Optimal Document Size for LSI Similarity Model

I'm using Gensim's excellent library to compute similarity queries on a corpus using LSI. However, I have a distinct feeling that the results could be better, and I'm trying to figure out whether I can adjust the corpus itself in order to improve…
faerubin
  • 177
  • 12
2
votes
1 answer

Implementing LSA for elasticsearch index

I've just spent the last couple days wrapping my head around implementing Latent Semantic Analysis for documents which are indexed in elasticsearch. the first step is to build the term-document matrix.So i think to use stanford nlp library that take…
Sara
  • 57
  • 1
  • 2
  • 11
2
votes
0 answers

How do I run LSA/SVD on a Spark DataFrame in a Pipeline?

I would like to be able to use the Pipeline functionality of Spark 2.0+ for building my models, but I cannot figure out how to incorporate LSA/SVD in my Pipeline. I am aware of the functionality on RDDs, but I do not believe that can be…
2
votes
0 answers

How to print out the documents in each clusters generated by LDA?

The print_top_words method from the code below only prints the distribution of the words for each topic: Cluster 1: word1 , word2 , .... Cluster 2: word3 , word2 , .... So, instead of printing out the words distribution, I would like to print the…
1
2
3
8 9