6

Yet another Lucene.net question by an extreme newbie to it.

This time, I have found an interesting issue with using a query that contains a range and using highlighting.

I am writing this from memory, so please forgive any syntax errors.

I have a hypothetical Lucene index of this:

---------------------------------------------------------
|       date         |               text               |
---------------------------------------------------------
|     1317809124     |       a crazy block of text      |
---------------------------------------------------------
|     1317809284     |       programmers are crazy      |
---------------------------------------------------------

** date is a unix timestamp        

... and they have been added to the index via this:

Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
doc.Add(new Lucene.Net.Documents.Field("text", "some block of text", Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.Add(new Lucene.Net.Documents.Field("date", "some unix timestamp", Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.NOT_ANALYZED));

This is how I am querying Lucene:

Lucene.Net.Analysis.Standard.StandardAnalyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Store.FSDirectory.Open(_headlinesDirectory), true);
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", analyzer);
Lucene.Net.Search.Query query = parser.Parse(queryPhrase);
Lucene.Net.Search.Hits hits = searcher.Search(query);

// code highlighting
Lucene.Net.Highlight.Formatter formatter = new Lucene.Net.Highlight.SimpleHTMLFormatter("<span style=\"background:yellow;\">","</span>");
Lucene.Net.Highlight.SimpleFragmenter fragmenter = new Lucene.Net.Highlight.SimpleFragmenter(50);
Lucene.Net.Highlight.QueryScorer scorer = new Lucene.Net.Highlight.QueryScorer(query);
Lucene.Net.Highlight.Highlighter highlighter = new Lucene.Net.Highlight.Highlighter(formatter, scorer);
highlighter.SetTextFragmenter(fragmenter);     

for (int i = 0; i < hits.Length(); i++)
{
    Lucene.Net.Documents.Document doc = hits.Doc(i);
    Lucene.Net.Analysis.TokenStream stream = analyzer.TokenStream("", new StringReader(doc.Get("text")));
    string highlightedText = highlighter.GetBestFragments(stream, doc.Get("text"), 1, "...");
    Console.WriteLine("--> " + highlightedText);
}

Here is an example of my query:

crazy AND date:[1286273266 TO 32503680000]

When this is queried, it finds all the results for "crazy" but does not output any highlighted text.

When the date range is removed and you simply query the term:

crazy

... this time highlighting works properly.

Is there something I am doing wrong in my implementation, should I be looking at a new implementation, or is this a known issue with potentially a work around.

Thank you in advance stackeroverflow'ers :)

-- EDIT --

I have implemented the suggestions from LB (amazing btw!). I still have no idea why this works as I think Lucene is complete voodoo or programming witchcraft, but it does and I am happy :).

For completeness, here is the modified code:

Lucene.Net.Analysis.Standard.StandardAnalyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Store.FSDirectory.Open(_headlinesDirectory), true);
Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", analyzer);

// new line here
parser.SetMultiTermRewriteMethod(Lucene.Net.Search.MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);

Lucene.Net.Search.Query query = parser.Parse(queryPhrase);

// new line here
Lucene.Net.Search.Query query2 = query.Rewrite(searcher.GetIndexReader());
Lucene.Net.Search.Hits hits = searcher.Search(query);

// code highlighting
Lucene.Net.Highlight.Formatter formatter = new Lucene.Net.Highlight.SimpleHTMLFormatter("<span style=\"background:yellow;\">","</span>");
Lucene.Net.Highlight.SimpleFragmenter fragmenter = new Lucene.Net.Highlight.SimpleFragmenter(50);

// changed to use query2
Lucene.Net.Highlight.QueryScorer scorer = new Lucene.Net.Highlight.QueryScorer(query2);

Lucene.Net.Highlight.Highlighter highlighter = new Lucene.Net.Highlight.Highlighter(formatter, scorer);
highlighter.SetTextFragmenter(fragmenter);

for (int i = 0; i < hits.Length(); i++)
{
    Lucene.Net.Documents.Document doc = hits.Doc(i);
    Lucene.Net.Analysis.TokenStream stream = analyzer.TokenStream("", new StringReader(doc.Get("text")));
    string highlightedText = highlighter.GetBestFragments(stream, doc.Get("text"), 1, "...");
    Console.WriteLine("--> " + highlightedText);
}

If you could, let me know if I have implemented the suggestions accurately.

nokturnal
  • 2,809
  • 4
  • 29
  • 39
  • 1
    No it's not voodoo. This trick just converts the multi term queries like PrefixQuery or RangeQuery to boolean query by expanding the terms using indexreader. Assume you have two terms in the index aaa1 and aaa2. A query like text:aaa* (which is a PrefixQuery) will be expanded to (text:aaa1 text:aaa2). You can test it by yourself with query.ToString() function. – L.B Oct 05 '11 at 15:54
  • Thanks for the explanation and solution. I am slowing getting my Lucene.NET solution working thanks to amazing tips from people like you. Cheers and thanks again. – nokturnal Oct 05 '11 at 16:44

1 Answers1

3

First invoke QueryParser's

SetMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE)

method, then create a new query as

Query newQuery = query.Rewrite(indexReader);

Now you can use "newQuery" to make your searches.

L.B
  • 114,136
  • 19
  • 178
  • 224
  • what sort of indexReader do i need to use for this rewrite call? – lurscher Dec 16 '11 at 15:15
  • @lurscher, You should already have a IndexReader. either you have created the indexSearcher by `new IndexSearcher(indexReader)` or by `new IndexSearcher(dir)` in this case you can get the reader by `searcher.GetIndexReader` – L.B Dec 16 '11 at 18:11