0

If I have a fullText field with these contents

In 2014 and 2015 the results were ... [more] ... and Sony are developing ... [more]

And query for

+loadTime:[2014 TO 2015] +fullText:sony

The Highlighter is picking 2014 and 2015 as the best fragment. How do I get the highlighter to ignore matches from the loadTime part of the query and use matches from the fullText part of the search? I want to see the ... sony ... fragment, even if it scores lower than the date parts that (just happened) to match the fullText.

My code:

ScoreDoc[] hits = [create search];
IFormatter formatter = new SimpleHTMLFormatter("<b>", "</b>");
QueryScorer scorer = new QueryScorer(query, );
Highlighter highlighter = new Highlighter(formatter, scorer);

for (int i = 0; i < hits.Length; i++)
{
    int docId = hits[i].Doc;
    float score = hits[i].Score;
    Document doc = search.Doc(docId);

    string fragments = string.Empty;
    if (collectFragments)
    {
        TokenStream stream = _analyzer.TokenStream("", new StringReader(doc.Get(AppConstants.Fields.FullText)));
        fragments = highlighter.GetBestFragments(stream, doc.Get(AppConstants.Fields.FullText), 2, "...");
    }

    ...
}
Ryan
  • 3,924
  • 6
  • 46
  • 69

1 Answers1

0

The expression "+loadTime:[2014 TO 2015] +fullText:sony" seems to mean you wanna match documents whose loadTime is between 2014 and 2015 and contains sony in fullText. Sorry, I read Lucene In Action(3.1.2 Parsing a user-entered query expression: QueryParser) and had a look at queryparsersyntax.html, but didn't find a way to write a query expression like yours. The closest one is

loadTime:[2014 TO 2015] AND fullText:sony

Maybe, because of version issue, mine is Lucene 3.4.0. And the way to solve your problem maybe QueryScore

/**
 * @param query Query to use for highlighting
 * @param field Field to highlight - pass null to ignore fields
*/
public QueryScorer(Query query, String field) {
  init(query, field, null, true);
}

In your code, I saw you left that parameter unfilled. I tried on my Intellij with Scala and it worked,following is the code.

def singleFieldHighlighter = {
    val textToDivide = "In 2016 and 2017 the results were ... [more] ... and Sony are developing ... [more]"
    val tokenStream = new StandardAnalyzer(Version.LUCENE_30).tokenStream("fullText", new StringReader(textToDivide));
    val searchString = "loadTime:[2014 TO 2015] AND fullText:sony"
    val parser = new QueryParser(version, "fullText", new WhitespaceAnalyzer(version))
    val parsedQuery = parser.parse(searchString)

    val scorer = new QueryScorer(parsedQuery, "fullText")
    val formatter = new SimpleHTMLFormatter("<span class='highlight'>", "</span>")
    val highlighter = new Highlighter(formatter, scorer)
    highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer))
    highlighter.getBestFragments(tokenStream, textToDivide, 3, "...")
}


// The result
// In 2014 and 2015 the results were ... [more] ... and <span class='highlight'>Sony</span> are developing ... [more]

Hope it helps, if not, I suggest you read chapter of Lucene In Action I mentioned above to figure it out.

Allen Chou
  • 1,229
  • 1
  • 9
  • 12
  • See this bit, for how to use "+" in lucene query syntax: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#+ . See also: ["Why Not AND, OR, And NOT?"](https://lucidworks.com/blog/why-not-and-or-and-not/) for why +/- lucene syntax is much more expressive of lucene's actual query behavior. – femtoRgon Jul 24 '15 at 17:37