I'm working on an existed project using Lucene
for searching and returning matches. It's not using any custom analyzer or any external algorithm. The documents are tiny with rows of no more than 50 words, thus I know LSA AND SVD
will work better with short text than corpus documents ( which usually tf-idf works well with long text inside each document), I want to put LSA And SVD
as the similarity metric when searching for matching for non-exact words. My problems are:
Do I need
custom analyzer
? I searched for that but what I found out is that custom analyzer mainly for analyzing the documents, not really applying similarity metric.Or do I need to change similarity like in this link https://lucene.apache.org/core/3_5_0/api/core/org/apache/lucene/search/package-summary.html#changingSimilarity?
if yes, Any examples for using LSA as the custom similarity? I'm quite new to java and lucene and I'm lost on how to start, any help will be appreciated
My documents are millions in total number, but each has few words.