0

I am working on latent semantic analysis, i am trying to get similarity from 2 documents. I run my code of latent semantic analysis on Python and when i run it i get :

Here are the singular values
[ 0.7376057   0.4596623   0.25422212]
Here are the first 3 columns of the U matrix
[[ 0.98465137 -0.172792   -0.02458864]
[ 0.15675976  0.81362269  0.55986114]
[ 0.07673365  0.55512255 -0.82822153]]
Here are the first 3 rows of the Vt matrix
[[ 0.08861949  0.02992777  0.36751379  0.9253024 ]
[ 0.78716383  0.34742637  0.43792207 -0.26056147]
[ 0.29462756 -0.93722956  0.17407106 -0.06704194]]

How i will find similarity from this numbers ?

YayaYaya
  • 125
  • 2
  • 3
  • 10

1 Answers1

-1

https://en.wikipedia.org/wiki/Latent_semantic_analysis explains LSI very well, also your problem.

say, you want to determine the similarity between document i and j. take the i-th column of V^t (=d_i) and j-th column of V^t (=d_j)

take the cosine similarity of diag(S)*d_i and diag(S) * d_j

the closer this is to +1, the more they are similar

hypnoticpoisons
  • 342
  • 4
  • 11