0

I am using Lucene.NET 2.9 with one of my projects. I am using Lucene to create indexes for documents and search on those documents. A field in my document is text heavy and I have stored that into my MS SQL Database. So basically I search via lucene on its indexes and then fetch complete documents from MS SQL database.

The problem I am facing is that I want to highlight my search query terms in results. For that I am using FastVectorHighlighter. Now this particular highlighter required Lucence DocId and field to highlight fields. The problem is that this particular text heavy field since is not stored in lucene database, is not highlighted in my search results.

Any suggestion on how to accomplish same. I either add the same field to my lucene database. It will resolve the problem but would make my database very heavy. Secondly if there is some alternative method to highlight the text it will give me very high flexibility.

Thank you for reading question, Naveen

Naveen
  • 1,067
  • 2
  • 14
  • 36

1 Answers1

1

if you dont want to store the text in the Lucene index, you should use the Highlighter contrib.

Latest sources for it can be grabbed at https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Highlighter/

Jf Beaulac
  • 5,206
  • 1
  • 25
  • 46
  • what all kind of queries does this support? and how is performance compared to fastvectorhighlight. Also does this need offset condition while creating index as required with fastvectorhighlight ? – Naveen May 18 '11 at 18:51
  • it supports almost all queries, if I remember correctly the only one I had to implement support for was the FuzzyQuery. You dont need offsets in the index. It is significantly slower than FastVectorHighlighter tho since you need to retrieve the text from another source, and tokenize it to get it Highlighted – Jf Beaulac May 19 '11 at 21:18
  • when I am searching for a phrase, the highlighter adds formatting tags around each term rather than against the whole query. Is it possible to make it add formatting tags against the whole phrase. – Naveen May 29 '11 at 14:27
  • 1
    the suggested approach by the Highlighter package author is to postprocess phrases to merge the highlighted terms : http://mail-archives.apache.org/mod_mbox/lucene-java-user/200906.mbox/%3c4A27AEDC.6040902@gmail.com%3e – Jf Beaulac May 30 '11 at 15:25
  • yeah, I guess it would be much easier for me to do it on my client(via regex + js). Since at a given time I have about 20-30 results with each results having 300 characters I guess it would be easier. Do you feel this approach is right ? – Naveen Jun 02 '11 at 05:36
  • is SimpleSpanFragmenter not part of .NET port of lucene. I couldnt find it in any of the contrib modules. – Naveen Jun 02 '11 at 13:07
  • doing the post processing on the client is in my opinion a good approach – Jf Beaulac Jun 08 '11 at 19:06
  • Thanks ! Any idea about SimpleSpanFragmenter. I didnt find it in .net port – Naveen Jun 09 '11 at 04:29
  • I dont see it on the SVN, it probably was not ported yet. Porting it should be simple tho, it seems to be a small class – Jf Beaulac Jun 09 '11 at 14:58