So I've been doing some research on the best way to implement Lucene.Net index searching and writing from within a web application. I set out with the following requirements:
- Need to allow concurrent searching and accessing of the index (queries run in parallel)
- there will be multiple indexes
- having an index search be completely up-to-date ("real-time") is NOT a requirement
- run jobs to update the indexes on some frequency (frequency is different for each index)
- obviously, would like to do all of this in a way which follows lucene "best practices" and can perform and scale well
I found some helpful resources, and a couple of good questions here on SO like this one
Following that post as guidance, I decided to try a singleton pattern with a concurrent dictionary of a wrapper built to manage an index.
To make things simpler, I'll pretend that I am only managing one index, in which case the wrapper can become the singleton. This ends up looking like this:
public sealed class SingleIndexManager
{
private const string IndexDirectory = "C:\\IndexDirectory\\";
private const string IndexName = "test-index";
private static readonly Version _version = Version.LUCENE_29;
#region Singleton Behavior
private static volatile SingleIndexManager _instance;
private static object syncRoot = new Object();
public static SingleIndexManager Instance
{
get
{
if (_instance == null)
{
lock (syncRoot)
{
if (_instance == null)
_instance = new SingleIndexManager();
}
}
return _instance;
}
}
#endregion
private IndexWriter _writer;
private IndexSearcher _searcher;
private int _activeSearches = 0;
private int _activeWrites = 0;
private SingleIndexManager()
{
lock(syncRoot)
{
_writer = CreateWriter(); //hidden for sake of brevity
_searcher = new IndexSearcher(_writer.GetReader());
}
}
public List<Document> Search(Func<IndexSearcher,List<Document>> searchMethod)
{
lock(syncRoot)
{
if(_searcher != null && !_searcher.GetIndexReader().IsCurrent() && _activeSearches == 0)
{
_searcher.Close();
_searcher = null;
}
if(_searcher == null)
{
_searcher = new IndexSearcher((_writer ?? (_writer = CreateWriter())).GetReader());
}
}
List<Document> results;
Interlocked.Increment(ref _activeSearches);
try
{
results = searchMethod(_searcher);
}
finally
{
Interlocked.Decrement(ref _activeSearches);
}
return results;
}
public void Write(List<Document> docs)
{
lock(syncRoot)
{
if(_writer == null)
{
_writer = CreateWriter();
}
}
try
{
Interlocked.Increment(ref _activeWrites);
foreach (Document document in docs)
{
_writer.AddDocument(document, new StandardAnalyzer(_version));
}
}
finally
{
lock(syncRoot)
{
int writers = Interlocked.Decrement(ref _activeWrites);
if(writers == 0)
{
_writer.Close();
_writer = null;
}
}
}
}
}
Theoretically, this is supposed to allow a thread-safe singleton instance for an Index (here named "index-test") where I have two publicly exposed methods, Search()
and Write()
which can be called from within an ASP.NET web application with no concerns regarding thread safety? (if this is incorrect, please let me know).
There was one thing which is giving me a little bit of trouble right now:
How do I gracefully close these instances on Application_End
in the Global.asax.cs file so that if I want to restart my web application in IIS, I am not going to get a bunch of write.lock failures, etc?
All I can think of so far is:
public void Close()
{
lock(syncRoot)
{
_searcher.Close();
_searcher.Dispose();
_searcher = null;
_writer.Close();
_writer.Dispose();
_writer = null;
}
}
and calling that in Application_End
, but if I have any active searchers or writers, is this going to result in a corrupt index?
Any help or suggestions are much appreciated. thanks.