0

What is the difference between TFIDFSimilarity, DefaultSimilarity, and SweetSpotSimilarity in Lucene 7.5.1?

How can we implement BM25F in Lucene?

zero323
  • 322,348
  • 103
  • 959
  • 935
Rocky
  • 21
  • 5

1 Answers1

0
  • TFIDFSimilarity - An abstract base class for TF-IDF similarities. A fairly straightforward tf-idf implementation. Exact algorithm is well documented: TFIDFSimilarity

  • DefaultSimilarity - Not a thing anymore. Deprecated in 5.0, removed in 6.0.

  • ClassicSimilarity - The old default similarity. An implementation of TFIDFSimilarity. Adds baseline calculations for tf, idf, length norms and encoding/decoding of norms, etc.

  • SweetSpotSimilarity - An alternate implementation of TFIDFSimilarity. Extends ClassicSimilarity, primaryily changes how lengthnorms are calculated.

  • BM25Similarity - The current default similarity implementation. Implementation of Okapi BM25.

As for BM25F, not aware of an implementation of it, out of the box. You'll likely want to modify BM25Similarity to suit that purpose. This article: BM25F in Lucene with BlendedTermQuery may be helpful.

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • Thanks for the valuable information. TFIDFSimilarity uses Vector Space Model, what about the BM25, which is now default similarity in Lucene,as you mentioned? – Rocky Oct 30 '18 at 06:25
  • I found two research papers as well: (1) https://www.sciencedirect.com/science/article/pii/S0306457300000157?via%3Dihub and (ii) https://www.sciencedirect.com/science/article/pii/S0306457300000169?via%3Dihub - Thought someone might be interested :) – Rocky Oct 31 '18 at 04:35