9

I have a problem with the score calculation with a PrefixQuery. To change score of each document, when add document into index, I have used setBoost to change the boost of the document. Then I create PrefixQuery to search, but the result have not been changed according to the boost. It seems setBoost totally doesn't work for a PrefixQuery. Please check my code below:

 @Test
 public void testNormsDocBoost() throws Exception {
    Directory dir = new RAMDirectory();
    IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT), true,
            IndexWriter.MaxFieldLength.LIMITED);
    Document doc1 = new Document();
    Field f1 = new Field("contents", "common1", Field.Store.YES, Field.Index.ANALYZED);
    doc1.add(f1);
    doc1.setBoost(100);
    writer.addDocument(doc1);
    Document doc2 = new Document();
    Field f2 = new Field("contents", "common2", Field.Store.YES, Field.Index.ANALYZED);
    doc2.add(f2);
    doc2.setBoost(200);
    writer.addDocument(doc2);
    Document doc3 = new Document();
    Field f3 = new Field("contents", "common3", Field.Store.YES, Field.Index.ANALYZED);
    doc3.add(f3);
    doc3.setBoost(300);
    writer.addDocument(doc3);
    writer.close();

    IndexReader reader = IndexReader.open(dir);
    IndexSearcher searcher = new IndexSearcher(reader);

    TopDocs docs = searcher.search(new PrefixQuery(new Term("contents", "common")), 10);
    for (ScoreDoc doc : docs.scoreDocs) {
        System.out.println("docid : " + doc.doc + " score : " + doc.score + " "
                + searcher.doc(doc.doc).get("contents"));
    }
} 

The output is :

 docid : 0 score : 1.0 common1
 docid : 1 score : 1.0 common2
 docid : 2 score : 1.0 common3
WorkSmarter
  • 3,738
  • 3
  • 29
  • 34
Keven
  • 111
  • 5

3 Answers3

11

By default, PrefixQuery rewrites the query to use ConstantScoreQuery, which gives every single matching document a score of 1.0. I think this is to make PrefixQuery faster. So your boosts are getting ignored.

If you want the boosts to take effect in your PrefixQuery, you need to call setRewriteMethod(), using the SCORING_BOOLEAN_QUERY_REWRITE constant on your prefix query instance. See http://lucene.apache.org/java/2_9_1/api/all/index.html .

For debugging, you can use searcher.explain().

bajafresh4life
  • 12,491
  • 5
  • 37
  • 46
  • Note this also seems to apply when when using setBoost at a field level. i.e PrefixQuery will appear to ignore the field boosts unless you change the rewritemethod as described here. – Simon Keep Jun 02 '11 at 17:09
2

It is the expected behavior. Here is the explanation of Lucene creator's Doug Cutting:

A PrefixQuery is equivalent to a query containing all the terms matching the prefix, and is hence usually contains a lot of terms. With such a big query, matching documents are likely to contain fewer of the query terms and the match is thus weaker.

Read the original post where the quote is taking from.

With Lucene, it is generally better to use the score only as a relative measure of relevancy in a set of documents. The absolute value of the score will change depending on so many factors that it should not be used as is.

UPDATE
The explanation from Cutting refers to an older version of Lucene. Thus the answer from bajafresh4life is the correct one.

Pascal Dimassimo
  • 6,908
  • 1
  • 37
  • 34
1

Changing the Rewrite Method

Bajafresh4life suggested calling setRewriteMethod. However, that's not how you change this in Lucene.Net. Here's how to do it in C#:

By default, each PrefixQuery is returned by the NewPrefixQuery method of QueryParser like so:

protected internal virtual Query NewPrefixQuery(Term prefix)
{
    return new PrefixQuery(prefix) { RewriteMethod = multiTermRewriteMethod };
}

You can change this after instantiating your parser by using the set property of QueryParser.MultiTermRewriteMethod, like so:

var parser = new QueryParser( Version.LUCENE_30, field, analyzer );
parser.MultiTermRewriteMethod = MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE;

Note that this will change the behavior for other queries as well, not just the prefix query. To affect just the prefix query, you can subclass QueryParser and override NewPrefixQuery so that the constructor for the returned PrefixQuery uses the rewrite method of your choice.

Which Rewrite Method to Use

That doesn't seem to have fixed it for me, though. I actually had better luck using MultiTermQuery.CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE. In the description for this method, it says

Like SCORING_BOOLEAN_QUERY_REWRITE except scores are not computed. Instead, each matching document receives a constant score equal to the query's boost.

But that could be because I also subclassed PrefixQuery and overrode ReWrite to assign the scores I want as boosts.

After a fair amount of debugging, I eventually figured out that, while I was trying to use SCORING_BOOLEAN_QUERY_REWRITE, DefaultSimilarity.QueryNorm was interfering with my scores when the value it returns is used in Weight.Normalize, which is called in Query.Weight.

DCShannon
  • 2,470
  • 4
  • 19
  • 32