1

I've just started using Whoosh and noticed that queries seem to have logic applied to each term such as AND([term1, term2, ...]) or OR([term1, term2, ...]).

My problem is that I want to include documents that include most of the terms in my search string, but not necessarily all. The more terms a doc has, the more "relevant" it should be. For example, if I search for "big brown cow" I want the results to include documents that only match terms "brown" and "cow", or "big" and "brown" but not necessarily both. Ofcourse, if documents have all terms then they should have a higher ranking than the others.

How can I accomplish this? (Without having to do a separate search for each individual combination of terms!)

Trindaz
  • 17,029
  • 21
  • 82
  • 111

1 Answers1

0

You can configure the Whoosh parser to default to using OR rather than AND between query terms. See http://packages.python.org/Whoosh/parsing.html#common-customizations.

You can then write a custom scoring class that scores items higher if they have more of the search terms. See http://packages.python.org/Whoosh/searching.html#scoring-and-sorting and http://packages.python.org/Whoosh/api/scoring.html#module-whoosh.scoring.

In all, the documentation is a good place to start looking for answers to questions like these.

kindall
  • 178,883
  • 35
  • 278
  • 309