Lucene: Document fields with weights

Question

I want to use Lucene to index documents together with a large amount of weighted tags (weights as probabilities of being true). These fields would all be called 'tag' to allow searches to be targeted on these tags and returning those documents with matching tags but the highest probabilities.

The code below only shows what I would like to do to make it more clear.

However, the field boosting in Lucene is meant to be applied to indexed fields and to the type of field, rather than the instance as added to the document. This means, that the solution below does not work and I would need to use fields with unique names in order to apply boosting to them.

I also know that this is a very bad solution and I wonder if somebody here knows a better way to do this. I would obviously need away to a) store the probabilities and b) have a way to use them in the retrieval process.

private void indexDocuments(IndexWriter writer) throws IOException {

    Document docA = new Document();
    Field pathFieldA = new StringField("path", "dog.jpg", Field.Store.YES);
    docA.add(pathFieldA);
    // add all tags to the index
    StringField c1 = new StringField("tag", "dog", Field.Store.YES);
    c1.setBoost(0.8f);
    docA.add(c1);
    StringField c2 = new StringField("tag", "cat", Field.Store.YES);
    c2.setBoost(0.2f);
    docA.add(c2);

    Document docB = new Document();
    Field pathFieldB = new StringField("path", "cat.jpg", Field.Store.YES);
    docB.add(pathFieldB);
    // add all tags to the index
    StringField tagB1 = new StringField("tag", "dog", Field.Store.YES);
    tagB1.setBoost(0.2f);
    docB.add(tagB1);
    StringField tagB2 = new StringField("tag", "cat", Field.Store.YES);
    tagB2.setBoost(0.8f);
    docB.add(tagB2);

    writer.addDocument(docB);
    writer.addDocument(docB);
}

possible duplicate of [Boosting Lucene Terms When Building the Index](http://stackoverflow.com/questions/8880396/boosting-lucene-terms-when-building-the-index) — bcoughlan, Oct 22 '14 at 10:02
Use Lucene payloads to stick the term boosts in the index (see the duplicate marked question). If your term boosts are going to change over time, then you will need to do query time boosting by looking up the term boosts in a database or HashMap. — bcoughlan, Oct 22 '14 at 10:04
Query time boosting is the `t.getBoost()` in the scoring formula: https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html#formula_termBoost — bcoughlan, Oct 22 '14 at 10:05

Lucene: Document fields with weights

0 Answers0