2

Lucene documentation states that single instances of IndexSearcher and IndexWriter should be used for each index in the whole application, and across all threads. Also, writes to an index will not be visible until the index is re-opened.

So, I'm trying to follow these guides, in a multi-threaded setup. (a few threads writing, multiple user threads searching). I don't want to re-open the index on every change, rather, I want to keep searcher instance not older than a certain amount of time (say, like 20 seconds).

A central component is responsible to open index readers and writers, and keep the single instance and synchronize the threads. I keep track of the last time the IndexSearcher has been accessed by any user thread, and the time it became dirty. If anyone needs to access it after 20 seconds has passed from the change, I want to close the searcher and re-open it.

The problem is that I'm not sure of the previously requests for the searcher (made by other threads) has finished yet, so that I can close the IndexSearcher. It means that if I close and re-open the single IndexSearcher instance that is shared among all threads, there might be a search going on concurrently in some other thread.

To make the matter worse, here's what can happen theoretically: there can be multiple searches being performed at the same time all the time. (suppose you have thousands of users running searches on the same index). The single IndexSearcher instance may never become free so that it can be closed. Ideally, I want to create another IndexSearcher and direct new requests to it (while the old one is still open and running the searches already requested before). When the searches running on the old instance are complete, I want to close it.

What is the best way to synchronize multiple users of the IndexSearcher (or IndexWriter) for calling the close() method? Does Lucene provide any features / facilities for this, or it should be done totally by the user code (like counting the threads using a searcher, and increase / decrease the count each time it is used)?

Are there any recommendation / ideas about the above mentioned design?

Iravanchi
  • 5,139
  • 9
  • 40
  • 56

3 Answers3

9

Thankfully in recent versions (3.x or late 2.x) they added a method to tell you if there has been any writing after the searcher had been opened. IndexReader.isCurrent() will tell you if any changes have occurred since this reader was open or not. So you probably will create a simple wrapper class that encapsulates both reading and writing, and with some simple synchronization you can provide 1 class that manages all of this between all of the threads.

Here is roughly what I do:

  public class ArchiveIndex {
      private IndexSearcher search;
      private AtomicInteger activeSearches = new AtomicInteger(0);
      private IndexWriter writer;
      private AtomicInteger activeWrites = new AtomicInteger(0);

      public List<Document> search( ... ) {
          synchronized( this ) {
              if( search != null && !search.getIndexReader().isCurrent() && activeSearches.get() == 0 ) {
                 searcher.close();
                 searcher = null;
              }

              if( search == null ) {
                  searcher = new IndexSearcher(...);
              }
          }

          activeSearches.increment();
          try {
              // do you searching
          } finally {
              activeSearches.decrement();
          }
          // do you searching
      }


      public void addDocuments( List<Document> docs ) {
          synchronized( this ) {
             if( writer == null ) {
                 writer = new IndexWriter(...);
             }
          }
          try {
              activeWrites.incrementAndGet();
              // do you writes here.
          } finally {
              synchronized( this ) {
                  int writers = activeWrites.decrementAndGet();
                  if( writers == 0 ) {
                      writer.close();
                      writer = null;
                  }
              }
          }
      }
  }

So I have single class that I use for both readers and writers. Notice this class allows writing and reading at the same time, and multiple readers can search at the same time. The only sync'ing is the quick checks to see if you need to reopen the searcher/writer. I didn't synchronize on the method level which would only allow one reader/writer at a time which would be bad performance wise. If there are active searchers out there you can't drop the searcher. So if you get lots of readers coming in it just simply searches without the changes. Once it slims out the next lone searcher will reopen the dirty searcher. This might be great for lower volume sites where there will be a pause in traffic. It could still cause starvation (ie you're always reading older and older results). You could add logic to simply stop and reinitialize if the time since it was noticed dirty is older than X otherwise we lazy as it is now. That way you'll be guaranteed searches will never be older than X.

Writers can be handled much in the same way. I tend to remember closing the writer periodically so the reader will notice its changed (commit it). I didn't do a very good job describing that, but it's much the same way of searching. If there are active writers out there you can't close the writer. If you're the last writer out the door close the writer. You get the idea.

chubbsondubs
  • 37,646
  • 24
  • 106
  • 138
  • The basic idea that you're presenting is to count the active searches, as I've written at the end of my post. As I've said, I don't want to re-open each time the index gets dirty, so some timing method should also be added to the above. It's actually close to what I was planning to do, but I'm wondering if there's anything else provided by the Lucene itself for closing an index. It should be easy for the Lucene engine to tell us if there's something going on on another thread. – Iravanchi Nov 19 '11 at 19:17
  • BTW, +1 to your answer, but I'm waiting for more ideas. And I guess the code has some synchronization issues (like the last finally block should be in synchronized(this) too), I suggest you fix them (if any) in case anyone else comes along and uses the code. – Iravanchi Nov 19 '11 at 19:19
  • Actually the code is fine according to the documents. It's fine to allow multiple threads to access the IndexWriter "NOTE: IndexWriter instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexWriter instance as this may cause deadlock; use your own (non-Lucene) objects instead." – chubbsondubs Nov 20 '11 at 03:48
  • Yes, writing to the index is perfectly thread safe and concurrent in Lucene. I mean you last `finally` block where you close the writer. Since it is not synced, after `activeWrites.decrementAndGet()` another thread may enter the method and start writing, and the current thread will go on and close the writer. – Iravanchi Nov 20 '11 at 19:47
  • Thanks I updated it. Almost considered putting in a compile error so people might think twice about xeroxing this code without thinking because I did this from memory. It's not actual tested code. – chubbsondubs Nov 22 '11 at 20:46
2

There is a relatively new SearcherManager class which takes care of this problem and can hide the IndexReader from your code entirely. Though the API is possibly subject to change, I see this as greatly simplifying things.

A basic tutorial from Mike McCandless, a Lucene project comitter: http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html

Community
  • 1
  • 1
Matt Ball
  • 354,903
  • 100
  • 647
  • 710
  • +1 Thanks for mentioning this. Actually I ended up using this in my code after I posted the question, but forgot to put a pointer to it in this page. – Iravanchi Jul 26 '12 at 16:58
0

You would only want to create a new reader if the actual index has changed. What I did, was to keep a reference to IndexReader, and drop it after I've reindexed stuff. That's because I want to be able to search during indexing, and I believe that you can't open an IndexReader while writing (correct me if I'm wrong).

I let the application create a new reader if there is none available, so it's sort of a caching that gets disposed after each index commit.

If you need realtime indexing capabilities (searching amongst the currently indexed entities during an idnexing oepration), you can grab an IndexReader from the current IndexWriter using the getReader() method.

jishi
  • 24,126
  • 6
  • 49
  • 75
  • You can open any number of read only `IndexReader`s and one that can write concurrently, as far as I know. My question does not concern creating the searchers, it concerns closing them and determining when it is safe to do so. – Iravanchi Nov 19 '11 at 19:13
  • You can't write with a Reader. And yes, you can open concurrent readers, but that has some overhead and you can read concurrently and they are thread safe so there is no reason to have multiple readers unless you have special reasons for it. When you read from a Reader, you read from the state of the index from when the Reader was created, meaning it would still function even during a re-indexing operation (consider what would happen if you do a DeleteAll() before re-indexing, next Reader would be an empty index). – jishi Nov 21 '11 at 08:40