Questions tagged [lsa]

LSA stands for Latent Semantic Analysis, a natural language processing technique which involves analysing the relationships between documents and terms they contain by producing a set of related concepts.

LSA stands for Latent Semantic Analysis, a natural language processing technique which involves analysing the relationships between documents and terms they contain by producing a set of related concepts.

For the Microsoft Windows subsystem, see (local-security-authority).

126 questions
0
votes
1 answer

Applying LSA on term document matrix when number of documents are very less

I have a term-document matrix (X) of shape (6, 25931). The first 5 documents are my source documents and the last document is my target document. The column represents counts for different words in the vocabulary set. I want to get the cosine…
Parth
  • 2,682
  • 1
  • 20
  • 39
0
votes
0 answers

interactive plots and zooming in matplotlib

i am using the following code to plot an LSA of some document. plt.rcParams['figure.figsize'] = [15, 15] svd = TruncatedSVD() Z = svd.fit_transform(X) plt.scatter(Z[:,0], Z[:,1]) for i in range(D): plt.annotate(s=index_word_map[i], xy=(Z[i,0],…
Talal Ghannam
  • 189
  • 2
  • 17
0
votes
1 answer

Calculating semantic coherence in a given speech transcript

I am trying to calculate the semantic coherence in a given paragraph/transcript, ie. if somebody goes off track while talking about a thing or topic - more specifically describing a picture (the picture might have many sub details). For example -…
0
votes
1 answer

Differences between BERT sentence embeddings and LSA embeddings

BERT as a service (https://github.com/hanxiao/bert-as-service) allows to extract sentence level embeddings. Assuming I have a pre-trained LSA model which gives me a 300 dimensional word vector, I am trying to understand in which scenario would an…
Samarth
  • 242
  • 2
  • 12
0
votes
2 answers

ValueError: shapes (4,4) and (3,) not aligned: 4 (dim 1) != 3 (dim 0)

import numpy as np A = np.matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]) u, s, vt = np.linalg.svd(A) print (np.dot(u, np.dot(np.diag(s), vt))) I use numpy for creating the matrix and It shows…
aloha
  • 23
  • 1
  • 5
0
votes
1 answer

Taking a latent semantic analysis (lsa) object and scoring on new data in R

I am running latent semantic analysis (LSA) using textmineR in R. What I'm hoping to get is the document by topic matrix with topics scores by document, which I can do by calling theta from my lsa object (below). However, I am running into…
Drew
  • 135
  • 4
  • 11
0
votes
0 answers

Forming a query vector in LSA

After performing the SVD of a term-document matrix, and getting a reduced rank matrix, various sources have stated the following reduced query vector formula. It seems easy to see how its derived. However, in this link, the query vector is…
0
votes
0 answers

How to retrieve only nouns from a file and pass them as an array to LSA?

I need to extract only those words whose tags match with pos-tags variable of program and pass those words to LSI model but when i print nouns i get an empty list. Here is my sample input of noun file: ['All,DT', 'praise,NN', 'is,VBZ', 'due,JJ',…
Nisa
  • 227
  • 3
  • 10
0
votes
1 answer

gensim document similarity: how to get document titles from most similar results?

I am using gensim to analyze document similarity in a large corpus. Each document has a "title", or more specifically, a unique ID string, along with the content text. After looking through several tutorials about top modeling, indexing and…
tony_tiger
  • 789
  • 1
  • 11
  • 25
0
votes
0 answers

How to find reduce similar words from column of list in python using nltk?

I have a column in pandas as below 0 ['business', 'ceremony', 'festival', 'group'] 1 ['mountain', 'outdoors', 'travel', 'tree', 'forest'] 2 ['people', 'city', 'politics', 'architecture'] 3 ['people', 'politics', 'protest',…
Praga
  • 1
0
votes
1 answer

Scala Convert [Seq[string] to [String]? (TF-IDF after lemmatization)

I try to learn scala and specificaly text minning (lemmatization ,TF-IDF matrix and LSA). I have some texts i want to lemmatize and make a classification (LSA). I use spark on cloudera. So i used the stanfordCore NLP fonction: def…
So ode
  • 31
  • 4
0
votes
0 answers

how i calculate cosine similarity by using jama

could anyone help me with detecting the problem? I need to calculate the similarity between the query and a collection of documents, and I've been used the program : https://github.com/aliabbasrizvi/LatentSemanticIndexing][1]. In this program, the…
nani je
  • 5
  • 6
0
votes
0 answers

java.lang.NoClassDefFoundError: org/apache/lucene/index/CorruptIndexException

i try to implement LSA semantic search using TML library.here is my code where rep1 is a folder that i create and dossier is a folder where i put my txt documents. public static void main(String[] args) throws Exception { Repository…
Sara
  • 57
  • 1
  • 2
  • 11
0
votes
1 answer

How to access individual documents in textmatrix in R

I have a textmatrix in R that looks like the following: I am trying to create one textmatrix from training and testing data. How can I access the different document columns to put into another textmatrix?
cody
  • 129
  • 2
  • 10
0
votes
1 answer

Latent Semantic Analysis and Stemming

Assume a very large corpus of any inflective language. Does the following make sense? By applying LSA on such corpus, words with similar concepts converge together in vector space, thus inflected word forms reffering to the same concept should…
L D
  • 593
  • 1
  • 3
  • 16
1 2 3
8 9