I've just spent the last couple days wrapping my head around implementing Latent Semantic Analysis for documents which are indexed in elasticsearch. the first step is to build the term-document matrix.So i think to use stanford nlp library that take as input the index Meaning lowercasing, removing stopwords, maybe stemming and generate the matrix or it is just possible using elasticsearch java api to build it?
Asked
Active
Viewed 503 times
1 Answers
0
Yes you can use _analyze
end point of elasticsearch and do tokenizing/character mapping/stemming/...
on your text and get result back.

Mohammad Mazraeh
- 1,044
- 7
- 12