How to use highlighter in pyLucene?

Question

I have read some tutorials about highlighting search terms in Lucene, and came up with a piece of code like this:

(...)
query = parser.parse(query_string)

for scoreDoc in  searcher.search(query, 50).scoreDocs:
    doc = searcher.doc(scoreDoc.doc)
    filename = doc.get("filename")
    print filename
    found_paraghaph = fetch_from_my_text_library(filename)

    stream = lucene.TokenSources.getTokenStream("contents", found_paraghaph, analyzer);
    scorer = lucene.Scorer(query, "contents", lucene.CachingTokenFilter(stream))
    highligter = lucene.Highligter(scorer)
    fragment = highligter.getBestFragment(analyzer, "contents", found_paraghaph)
    print '>>>' + fragment

But it all ends with an error:

Traceback (most recent call last):
  File "./search.py", line 76, in <module>
    scorer = lucene.Scorer(query, "contents", lucene.CachingTokenFilter(stream))
NotImplementedError: ('instantiating java class', <type 'Scorer'>)

So, I guess, that this part of Lucene insn't iplemented yet in pyLucene. Is there any other way to do it?

score 4 · Accepted Answer · answered Sep 23 '12 at 09:53

I too got similar error. I think this class's wrapper is not yet implemented for Pylucene v3.6.

You might want to try the following:

analyzer = StandardAnalyzer(Version.LUCENE_CURRENT)

# Constructs a query parser.
queryParser = QueryParser(Version.LUCENE_CURRENT, FIELD_CONTENTS, analyzer)

# Create a query
query = queryParser.parse(QUERY_STRING)

topDocs = searcher.search(query, 50)

# Get top hits
scoreDocs = topDocs.scoreDocs
print "%s total matching documents." % len(scoreDocs)

HighlightFormatter = SimpleHTMLFormatter();
highlighter = Highlighter(HighlightFormatter, QueryScorer (query))

for scoreDoc in scoreDocs:
    doc = searcher.doc(scoreDoc.doc)
    text = doc.get(FIELD_CONTENTS)
    ts = analyzer.tokenStream(FIELD_CONTENTS, StringReader(text))
    print doc.get(FIELD_PATH)
    print highlighter.getBestFragments(ts, text, 3, "...")
    print ""

Please note that we create token stream for each item in the search result.

thanks! Seems that the most important part here is creating `QueryScorer` instead of `Scorer` - now, when I looked it up in Lucene's documentation, I found out that `Scorer` is an abstract class, so that's why this error come up. And name `NotImplementedError` is quite misleading here... — mik01aj, Sep 26 '12 at 18:10
The code works great. One thing to mention, The `StringReader` is import from `java.io`, not from `lucene`. — vancexu, Apr 22 '14 at 05:24

How to use highlighter in pyLucene?

1 Answers1