Questions tagged [latent-semantic-indexing]

Latent semantic indexing is an indexing and retrieval method.

Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A claimed feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts.

52 questions
0
votes
1 answer

How to incorporate features from a latent semantic analysis as independent variables in a predictive model

I am trying to run logistic regression using text data in R. I have built a term document matrix and a corresponding latent semantic space. In my understanding, LSA is used in deriving 'concepts' out of 'terms' which could help in dimension…
0
votes
1 answer

Latent Semantic Indexation with gensim

In order to use the Latent semantic indexation method from gensim, I want to begin with a small "classique" example like : import logging, gensim, bz2 id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt') mm =…
0
votes
1 answer

Latent Semantic Indexing

I'm trying to find out how to carry out the multiplication of the matrices produced after SVD implementation in LSI. I need this for my research. I want to carry out document clustering.
Rochetta
  • 1
  • 1
0
votes
1 answer

choose the proper clustering method for Latent Semantic Analysis

i want to cluster some text document to find the document with the same concept. i've done the semantic similarity using Latent Semantic Analysis (LSA), but i confuse which clustering method that i should choose for my purpose . Thank you
0
votes
1 answer

LSA Similarity interface

I am a PhD student in translation studies and I am currently working on my dissertation. I am using LSA Similarity interface as a method of analysis in my dissertation. My background is in linguistics and not computer science. I tried to find an…
0
votes
1 answer

SVD output interpretation in mahout

I am trying to run a SVD job in mahout. I have a matrix (say A) created (Document x term) of size 372053 x 21338 (21338 no of unique words say N, 372053 documents say M). So my matrix A is of size (M*N). I ran the svd using mahout and i got the…
Dinesh
  • 31
  • 2
  • 7
0
votes
2 answers

Latent Semantic Indexing

It is said that through LSI, the matrices that are produced U, A and V, they bring together documents which have synonyms. For e.g. if we search for "car", we also get documents which have "automobile". But LSI is nothing but manipulations of…
avd
  • 13,993
  • 32
  • 78
  • 99
1 2 3
4