I'm working on a project that uses Google App Engine's text search API to allow users to search for documents that include a words field. I'm sorting using a MatchScorer, which according to the documentation "assigns a score based on term frequency in a document".
When a user enters a query like "business promo", I convert this into a query string that looks like words:business OR words:promo
. I would have expected that this would return documents that contain both the words "business" and "promo" before documents that only contain one of the words (since the documentation says it assigns a score based on term frequency in the document). However, I frequently see results that contain only one of the words before documents that contain both.
I've also tried querying using the RescoringMatchScorer, but see the same problem using this scorer.
I've thought about doing separate queries - ones that AND the search terms and ones that OR the search terms - but this would require many queries if the user enters more than two search terms. For example, if I searched for "advanced business solutions", I'd need queries like this to cover all the bases:
words:advanced AND words:business AND words:solutions
words:advanced AND words:business
words:advanced AND words:solutions
words:business AND words:solutions
words:advanced OR words:business OR words:solutions
Does anyone have any hints on how to perform searches that return more relevant results (i.e. more search term matches) before less relevant results?