15

I'm wondering if there are any recommendations, best practises or top-tips for integrating a Lucene.NET based search into an ASP.NET MVC web application?

Things I've read (or think I've read) in various places include the following:

  • One IndexWriter, many IndexReaders
  • When the index is updated, reset/ re-initialise the IndexReaders

Are there any other useful tips or resources I should read before starting?

Thanks,
Kieron

Kieron
  • 26,748
  • 16
  • 78
  • 122

2 Answers2

6

Here are my tips (in no particular order):

  • Choose the most appropriate locking mechanism.
  • Use the SetRAMBufferSizeMB to reduce the disk I/O overhead when writing the index.
  • Don't over use the SetMaxBufferedDocs property.
  • Use the Search hits (TopDocs and ScoreDoc[]) object to retrieve the index search results.
  • Index writing is an expensive operation, so use it sparingly.
  • Know the data that you will be indexing as some data types (I.E., dates) can be difficult to search on if they are not stored consistently.

A few gotchas from one of my previous projects were:

  • I had to use the BooleanQuery to do a traditional AND operation for searching multiple fields.
  • There is no UPDATE functionality within Lucene so a document needs to be deleted and re-added.
  • You can't sort / OrderBy on a tokenized field.

I would suggest looking at the source code for RavenDb as it is built on top of Lucene and uses a number of best practices.

Kane
  • 16,471
  • 11
  • 61
  • 86
  • Great tips, cheers. Locking seems a little random to me...do you know of anywhere that details the different types of locks and their usage? – Kieron May 24 '11 at 12:01
  • 1
    You most definitely can update in lucene -- you just find the document, update the fields and re-add it to the index with the same key. – Wyatt Barnett May 24 '11 at 13:36
  • @Wyatt, the key is the documentid, and it's never changed within a segment. Are you thinking of IndexWriter.UpdateDocument which does a delete+add? – sisve May 30 '11 at 13:48
4

RavenDb is definitely the easiest way to go here -- it really is lucene++.

In terms of how to use it, I'd recommend looking at the SubText blogging engine. Code is MIT licensed so you can just use it in your project and it has a very well designed index writer/reader.

In our apps, we tend to have one writer and a separate app with many readers. The locking strategy can be key here--especially make sure the readers don't try and lock the index. I'm blanking on the specific term we had to use to make this happen.

Wyatt Barnett
  • 15,573
  • 3
  • 34
  • 53