override similarity with Lucene and use LSA+SVD instead

Question

I'm working on an existed project using Lucene for searching and returning matches. It's not using any custom analyzer or any external algorithm. The documents are tiny with rows of no more than 50 words, thus I know LSA AND SVD will work better with short text than corpus documents ( which usually tf-idf works well with long text inside each document), I want to put LSA And SVD as the similarity metric when searching for matching for non-exact words. My problems are:

Do I need custom analyzer? I searched for that but what I found out is that custom analyzer mainly for analyzing the documents, not really applying similarity metric.
Or do I need to change similarity like in this link https://lucene.apache.org/core/3_5_0/api/core/org/apache/lucene/search/package-summary.html#changingSimilarity?

if yes, Any examples for using LSA as the custom similarity? I'm quite new to java and lucene and I'm lost on how to start, any help will be appreciated

My documents are millions in total number, but each has few words.

override similarity with Lucene and use LSA+SVD instead

0 Answers0