Solr cloud indexing hang

Question

I am working with solrcloud now, but I am facing a problem which could cause indexing process hang.

My deployment is only one collection having 5 shard running at 5 machine. Every day we will do a full index using dataimporthandler, which have 50m docs. and we trigger indexing at one of 5 machine, using distribute indexing of solrcloud.

I have founded that, sometimes one of 5 machine will die, cause of

2013-01-08 10:43:35,879 ERROR core.SolrCore - java.io.FileNotFoundException: /home/admin/index/core_p_shard2/index/_31xu.fnm (No such file or directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
        at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:222)
        at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
        at org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:52)
        at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:101)
        at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:57)
        at org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:120)
        at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:267)
        at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:3010)
        at org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:180)
        at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:310)
        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:386)
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
        at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:448)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:325)
        at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:230)
        at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
        at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
        at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)

and I have check index dir, which does not contain _31xu.fnm indeed. I am wondering it's there some concurrent bug in distribute indexing?

As far as I konw, distribute indexing is work like this. you can send docs to any shard, and docs will forword to correct shard according to a hash id. and dataimporthandler will forward docs to correc shard using updatehandler. and finally docs will be flushed to disk via DocumentsWriterPerThread. I am wondering it's there are too much update request which sended from the shard triggered indexing caused the problem. My guess is based on that I found at the machine whild died has a lot of index segment, and each of them is very small.

I am not familiar with solr too much, may be my guess has no meaning at all, does anyone have some idea? thanks

Seems like either a race condition with an index file getting merged out as another thread is attempting to use it, or that the file, for one reason or another, is failing to be written in the first place. Is this the only exception that you see when this occurs? Or do others appear that might be leading up to this issue? — femtoRgon, Jan 09 '13 at 16:23

Solr cloud indexing hang

0 Answers0