As I understand pyLucene now offers BM25 similarity also. I am using pyLucene - 4.10.1, but can't find any example as to how to use BM25 instead of tf-idf. Please guide.
Asked
Active
Viewed 887 times
1 Answers
1
Try using setSimilarity of IndexSearcher to setup the retrieval model.
import lucene
from java.nio.file import Paths
from org.apache.lucene.store import SimpleFSDirectory
from org.apache.lucene.index import DirectoryReader
from org.apache.lucene.search import IndexSearcher
from org.apache.lucene.search.similarities import BM25Similarity
lucene.initVM(vmargs=['-Djava.awt.headless=true'])
directory = SimpleFSDirectory(Paths.get(INDEX_DIR))
searcher = IndexSearcher(DirectoryReader.open(directory))
searcher.setSimilarity(BM25Similarity())

Salias
- 480
- 6
- 19