4

How do I get a similarity measure of a document using Whoosh?

I want to create a "Related" feature that ranks other previously indexed documents that have a high similarity to a document.

Do I input the document as a long query string? Do I add the document to the index and extract a similarity query result somehow from there?

Thanks

seanieb
  • 1,196
  • 2
  • 14
  • 36

1 Answers1

2

The Whoosh searcher class has a method called 'more_like()'.

It allows you to compare and indexed document to other indexed documents and returns a list of documents similar to the given document.

And the class whoosh.searching.Hit can give a rank and a score.

Updated links:

more_like() : https://whoosh.readthedocs.io/en/latest/api/searching.html#whoosh.searching.Searcher.more_like
whoosh.searching.Hit : https://whoosh.readthedocs.io/en/latest/api/searching.html#whoosh.searching.Hit

karel
  • 5,489
  • 46
  • 45
  • 50
seanieb
  • 1,196
  • 2
  • 14
  • 36