10

So I've been doing some research on the best way to implement Lucene.Net index searching and writing from within a web application. I set out with the following requirements:

  • Need to allow concurrent searching and accessing of the index (queries run in parallel)
  • there will be multiple indexes
  • having an index search be completely up-to-date ("real-time") is NOT a requirement
  • run jobs to update the indexes on some frequency (frequency is different for each index)
  • obviously, would like to do all of this in a way which follows lucene "best practices" and can perform and scale well

I found some helpful resources, and a couple of good questions here on SO like this one

Following that post as guidance, I decided to try a singleton pattern with a concurrent dictionary of a wrapper built to manage an index.

To make things simpler, I'll pretend that I am only managing one index, in which case the wrapper can become the singleton. This ends up looking like this:

public sealed class SingleIndexManager
{
    private const string IndexDirectory = "C:\\IndexDirectory\\";
    private const string IndexName = "test-index";
    private static readonly Version _version = Version.LUCENE_29;

    #region Singleton Behavior
    private static volatile SingleIndexManager _instance;
    private static object syncRoot = new Object();

    public static SingleIndexManager Instance
    {
        get
        {
            if (_instance == null)
            {
                lock (syncRoot)
                {
                    if (_instance == null)
                        _instance = new SingleIndexManager();
                }
            }

            return _instance;
        }
    }
    #endregion

    private IndexWriter _writer;
    private IndexSearcher _searcher;

    private int _activeSearches = 0;
    private int _activeWrites = 0;

    private SingleIndexManager()
    {
        lock(syncRoot)
        {
            _writer = CreateWriter(); //hidden for sake of brevity
            _searcher = new IndexSearcher(_writer.GetReader());
        }
    }

    public List<Document> Search(Func<IndexSearcher,List<Document>> searchMethod)
    {
        lock(syncRoot)
        {
            if(_searcher != null && !_searcher.GetIndexReader().IsCurrent() && _activeSearches == 0)
            {
                _searcher.Close();
                _searcher = null;
            }
            if(_searcher == null)
            {
                _searcher = new IndexSearcher((_writer ?? (_writer = CreateWriter())).GetReader());
            }
        }
        List<Document> results;
        Interlocked.Increment(ref _activeSearches);
        try
        {
            results = searchMethod(_searcher);
        } 
        finally
        {
            Interlocked.Decrement(ref _activeSearches);
        }
        return results;
    }

    public void Write(List<Document> docs)
    {
        lock(syncRoot)
        {
            if(_writer == null)
            {
                _writer = CreateWriter();
            }
        }
        try
        {
            Interlocked.Increment(ref _activeWrites);
            foreach (Document document in docs)
            {
                _writer.AddDocument(document, new StandardAnalyzer(_version));
            }

        } 
        finally
        {
            lock(syncRoot)
            {
                int writers = Interlocked.Decrement(ref _activeWrites);
                if(writers == 0)
                {
                    _writer.Close();
                    _writer = null;
                }
            }
        }
    }
}

Theoretically, this is supposed to allow a thread-safe singleton instance for an Index (here named "index-test") where I have two publicly exposed methods, Search() and Write() which can be called from within an ASP.NET web application with no concerns regarding thread safety? (if this is incorrect, please let me know).

There was one thing which is giving me a little bit of trouble right now:

How do I gracefully close these instances on Application_End in the Global.asax.cs file so that if I want to restart my web application in IIS, I am not going to get a bunch of write.lock failures, etc?

All I can think of so far is:

public void Close()
{
    lock(syncRoot)
    {
        _searcher.Close();
        _searcher.Dispose();
        _searcher = null;

        _writer.Close();
        _writer.Dispose();
        _writer = null;
    }
}

and calling that in Application_End, but if I have any active searchers or writers, is this going to result in a corrupt index?

Any help or suggestions are much appreciated. thanks.

Community
  • 1
  • 1
Leland Richardson
  • 2,695
  • 2
  • 20
  • 27
  • Your code seems find, but since you initialize the Writer in the contructor, I would simply keep it opened and remove all the initialization/locking in the Write() method. – Jf Beaulac Jul 06 '12 at 12:49
  • Would it be better to initialize the searcher off of the directory, and only open the writer when needed - if I am going to be reading much more than writing? – Leland Richardson Jul 06 '12 at 15:05
  • I dont know, i dont have much experience doing that, I usually keep my IndexWriters opened for the lifetime of the application and use commit() when i modify the index and open the searcher using the IndexWriter.GetReader() method. – Jf Beaulac Jul 06 '12 at 15:52
  • @JfBeaulac So maybe I should change it to just have the writer open always (except for after calling close) and create a Commit() method as well? Or perhaps commit after every write? Thanks for the help btw. – Leland Richardson Jul 06 '12 at 16:16
  • @LelandRichardson FYI, Lucene.net is thread safe and you don't have to use any synchronization mechanisms(like SingleIndexManagers, locks etc.). Just create/get your IndexReaders/IndexWriters and use them. I generally open one IndexReader and one IndexWriter application-wide and use them in all threads. – L.B Jul 09 '12 at 19:35
  • @L.B Thanks. I'm aware that Readers/Writers/Searchers are thread safe (and in fact process safe, I believe) which is what makes code like above possible. The code above is mainly managing the closing/reopening and syncing of the writers and searchers. The actual writer and searcher is being shared across threads... but it allows for the reopening of the searcher after indexing documents to be thread safe as well and to make sure you don't close a writer while another thread is using it, etc.. Hope that makes sense? – Leland Richardson Jul 09 '12 at 21:25
  • you can do something like I've done [here][1]. [1]: http://stackoverflow.com/questions/14473427/singleton-pattern-for-indexwriter-and-indexsearcher-lucene-net – Amit Kumar Jan 25 '13 at 12:56

3 Answers3

11

Lucene.NET is very thread safe. I can say for sure that all of the methods on the IndexWriter and IndexReader classes are thread-safe and you can use them without having to worry about synchronization. You can get rid of all of your code that involves synchronizing around instances of these classes.

That said, the bigger problem is using Lucene.NET from ASP.NET. ASP.NET recycles the application pool for a number of reasons, however, while shutting down one application domain, it brings up another one to handle new requests to the site.

If you try to access the same physical files (assuming you are using the file-system based FSDirectory) with a different IndexWriter/IndexReader, then you'll get an error as the lock on the files hasn't been released by the application domain that hasn't been shut down yet.

To that end, the recommended best practice is to control the process that is handling the access to Lucene.NET; this usually means creating a service in which you'd expose your operations via Remoting or WCF (preferably the latter).

It's more work this way (as you'd have to create all of the abstractions to represent your operations), but you gain the following benefits:

  • The service process will always be up, which means that the clients (the ASP.NET application) won't have to worry about contending for the files that FSDirectory requires. They simply have to call the service.

  • You're abstracting your search operations on a higher level. You aren't accessing Lucene.NET directly, but rather, your defining the operations and types that are required for those operations. Once you have that abstracted away, if you decide to move from Lucene.NET to some other search mechanism (say RavenDB), then it's a matter of changing the implementation of the contract.

Community
  • 1
  • 1
casperOne
  • 73,706
  • 19
  • 184
  • 253
  • Regarding file locking with the `IndexReader`/`IndexWriter`, you will generally only get a locking error if you try and open two `IndexWriter`s on an index. You can share an `IndexReader`/`IndexWriter`, or you can have *multiple* `IndexReader`s open in different threads and/or processes without any issues, even if you're writing to the index with another *single* `IndexWriter`. If the writer has committed changes, however, any open `IndexReader`s would need to be reopened to see the changes. – Christopher Currens Aug 20 '12 at 16:35
3
  • Opening an IndexWriter may be a heavy operation. You can reuse it.
  • There's a lock in Write(...) to ensure a transactional behavior, all documents are added and written to disk before the method returns. The call to Commit() can be a lengthy operation (it may cause segment merges). You can move this to a background thread if you want (which introduces scenarios where some of the documents added are written in a commit, some in another).
  • There's no need for an unconditional lock in your Search(...) method. You could check if you have a _searcher instance, and use it. It is set to null in Write(...) to force a new searcher.
  • I'm not sure about your use of a searchMethod, it looks like something a collector is better suited for.


public sealed class SingleIndexManager {
    private static readonly Version _version = Version.LUCENE_29;
    private readonly IndexWriter _writer;
    private volatile IndexSearcher _searcher;
    private readonly Object _searcherLock = new Object();

    private SingleIndexManager() {
        _writer = null; // TODO
    }

    public List<Document> Search(Func<IndexSearcher, List<Document>> searchMethod) {
        var searcher = _searcher;
        if (searcher == null) {
            lock (_searcherLock) {
                if (_searcher == null) {
                    var reader = _writer.GetReader();
                    _searcher = searcher = new IndexSearcher(reader);
                }
            }
        }

        return searchMethod(searcher);
    }

    public void Write(List<Document> docs) {
        lock (_writer) {
            foreach (var document in docs) {
                _writer.AddDocument(document, new StandardAnalyzer(_version));
            }

            _writer.Commit();
            _searcher = null;
        }
    }
}
sisve
  • 19,501
  • 3
  • 53
  • 95
1

You can also disable application pool overlap setting in IIS to avoid Lucene write.lock issues when one app pool is shutting down (but still holding the write.lock) and IIS is preparing another one for new requests.

Vivek
  • 430
  • 5
  • 14