Questions tagged [latent-semantic-indexing]

Latent semantic indexing is an indexing and retrieval method.

Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A claimed feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts.

52 questions
1
vote
1 answer

LSA - steps after finding the SVD

I have read quite a few tutorials since morning . My problem involves finding the similarity between two documents. I am looking forward to use LSA in java for this purpose. I understood the creation of the term-document matrix and then the…
CTsiddharth
  • 907
  • 12
  • 21
1
vote
1 answer

Cosine similarity between two dictionary's values

I have this dict called queries: {'q1': ['similar', 'law', 'must', 'obey', 'construct', 'aeroelast', 'model', 'heat', 'high', 'speed', 'aircraft'], 'q2': ['structur', 'aeroelast', 'problem', 'associ', 'flight', …
1
vote
2 answers

How to extract semantic relatedness from a text corpus

The goal is to assess semantic relatedness between terms in a large text corpus, e.g. 'police' and 'crime' should have a stronger semantic relatedness than 'police' and 'mountain' as they tend to co-occur in the same context. The simplest approach…
Mulone
  • 3,603
  • 9
  • 47
  • 69
1
vote
0 answers

How can I get the topic scores attributed to a document on gensim LSI?

I am a newbie in python and ML. I found a nice script (https://www.machinelearningplus.com/nlp/topic-modeling-visualization-how-to-present-results-lda-models/) on how to get attributed topics to each document for LDA and I changed it to be able to…
1
vote
0 answers

LSI Model fails to load the model

I have a LSI model stored and the model is getting stored as model.pkl and model.pkl.projection. However, when I try to load the model the loading is failing because its trying to look for projection file with .npy loading LsiModel object from…
Shrikar
  • 840
  • 1
  • 8
  • 30
1
vote
0 answers

Calculate conceptual and relation similarity of two words in Java

I am implementing a readability formula in Java based on this paper. I reached the point where I have to compute the conceptual and the relational similarity of two or more words. They say: We use Latent Semantic Analysis (LSA) tools to compute…
João Alves
  • 185
  • 1
  • 5
  • 14
1
vote
1 answer

Simple Binary Text Classification

I seek the most effective and simple way to classify 800k+ scholarly articles as either relevant (1) or irrelevant (0) in relation to a defined conceptual space (here: learning as it relates to work). Data is: title & abstract (mean=1300…
1
vote
2 answers

Number of Latent Semantic Indexing topics

I'm using gensim's package to implement LSI on a corpus. My goal is to find out the most frequently occurring distinct topics that appear in the corpus. If I don't know the number of topics that are in the corpus (I'd estimate anywhere from 5 to…
John Ma
  • 41
  • 5
1
vote
1 answer

How to obtain the topic score in LSI model of Gensim?

I have been using LsiModel in gensim for modelling topics from a corpus of 10000 mails. I am able to get the words and word scores for each topic and store them in a file. I have tried using print_topics() and show_topics() but both return only the…
manofsins
  • 1,583
  • 2
  • 10
  • 12
1
vote
1 answer

Latent semantic analysis (LSA) single value decomposition (SVD) understanding

Bear with me through my modest understanding of LSI (Mechanical Engineering background): After performing SVD in LSI, you have 3 matrices: U, S, and V transpose. U compares words with topics and S is a sort of measure of strength of each feature. Vt…
user2040444
0
votes
0 answers

MIMIC model failed to converge: using lavaan to assess the effect of maternal empowerment on child malnutrition

indicators <- c("stunting", "wasting", "underweight_small") independent_vars<- c("colostrum", "ors", "familyplanning","foodgroups", "visits", "occupation", "respIncome", "resphealthdec", "lhhpurchases", "ownhouse", "wland") control_vars <-…
0
votes
2 answers

Why are the signs of my topic weights changing from run to run?

I'm running the LSI program from Gensim's Topics and Transformations tutorial and for some reason, the signs of the topic weights keep switching from positive to negative and vice versa. For example, this is what I get when I print using the…
0
votes
0 answers

nltk latent semantic analysis copies the first topics over and over

This is my first attempt with Natural Language Processing so I started with Latent Semantic Analysis and used this tutorial to build the algorithm. After testing it I see that it only classifies the first semantic words and repeats the same terms…
0
votes
1 answer

Which formula of tf-idf does the LSA model of gensim use?

There are many different ways in which tf and idf can be calculated. I want to know which formula is used by gensim in its LSA model. I have been going through its source code lsimodel.py, but it is not obvious to me where the document-term matrix…
0
votes
1 answer

AttributeError module 'Pyro4' has no attribute 'expose' while running gensim distributed LSI

So I am trying to run the demo from gensim for distributed LSI (You can find it here) Yet whenever I run the code I get the error AttributeError: module 'Pyro4' has no attribute 'expose' I have checked similar issues here on stackoverflow, and…
Abdelrahman Shoman
  • 2,882
  • 7
  • 36
  • 61