Lucene Java opening too many files. Am I using IndexWriter properly?

Question

My Lucene Java implementation is eating up too many files. I followed the instructions in the Lucene Wiki about too many open files, but that only helped slow the problem. Here is my code to add objects (PTicket) to the index:

//This gets called when the bean is instantiated
public void initializeIndex() {
    analyzer = new WhitespaceAnalyzer(Version.LUCENE_32);
    config = new IndexWriterConfig(Version.LUCENE_32, analyzer);

}


public void addAllToIndex(Collection<PTicket> records) {  
    IndexWriter indexWriter = null;
    config = new IndexWriterConfig(Version.LUCENE_32, analyzer);

    try{
        indexWriter = new IndexWriter(directory, config);
        for(PTicket record : records) {
            Document doc = new Document();
            StringBuffer documentText = new StringBuffer();
            doc.add(new Field("_id", record.getIdAsString(), Field.Store.YES, Field.Index.ANALYZED));
            doc.add(new Field("_type", record.getType(), Field.Store.YES, Field.Index.ANALYZED));

            for(String key : record.getProps().keySet()) {
                List<String> vals = record.getProps().get(key);

                for(String val : vals) {
                    addToDocument(doc, key, val);
                    documentText.append(val).append(" ");
                }
            }
            addToDocument(doc, DOC_TEXT, documentText.toString());        
            indexWriter.addDocument(doc);    
        }

        indexWriter.optimize();
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        cleanup(indexWriter);
    }
}

private void cleanup(IndexWriter iw) {
    if(iw == null) {
        return;
    }

    try{
        iw.close();
    } catch (IOException ioe) {
        logger.error("Error trying to close index writer");
        logger.error("{}", ioe.getClass().getName());
        logger.error("{}", ioe.getMessage());
    }
}

private void addToDocument(Document doc, String field, String value) {
    doc.add(new Field(field, value, Field.Store.YES, Field.Index.ANALYZED));
}

EDIT TO ADD code for searching

public Set<Object> searchIndex(AthenaSearch search) {  

    try {
        Query q = new QueryParser(Version.LUCENE_32, DOC_TEXT, analyzer).parse(query);

        //search is actually instantiated in initialization.  Lucene recommends this.
        //IndexSearcher searcher = new IndexSearcher(directory, true);
        TopDocs topDocs = searcher.search(q, numResults);
        ScoreDoc[] hits = topDocs.scoreDocs;
        for(int i=start;i<hits.length;++i) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            ids.add(d.get("_id"));
        }
        return ids;
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    }
}

This code is in a web application.

1) Is this the advised way to use IndexWriter (instantiating a new one on each add to index)?

2) I've read that raising ulimit will help, but that just seems like a band-aid that won't address the actual problem.

3) Could the problem lie with IndexSearcher?

just increase the number of filedescriptors on your server – Thomas Jungblut Jun 19 '11 at 16:31 — Thomas Jungblut, Jun 19 '11 at 16:31

Narayan · Accepted Answer · 2011-06-22T11:45:22.943

1) Is this the advised way to use IndexWriter (instantiating a new one on each add to index)?

i advise No, there are constructors, which will check if exists or create a new writer, in the directory containing the index. problem 2 would be solved if you reuse the indexwriter.

EDIT:

Ok it seems in Lucene 3.2 the most but one constructors are deprecated,so the resue of Indexwriter can be achieved by using Enum IndexWriterConfig.OpenMode with value CREATE_OR_APPEND.

also, opening new writer and closing on each document add is not efficient,i suggest reuse, if you want to speed up indexing, set the setRamBufferSize default value is 16MB, so do it by trial and error method

from the docs:

Note that you can open an index with create=true even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open.

also reuse the IndexSearcher,i cannot see the code for searching, but Indexsearcher is threadsafe and can be used as Readonly as well

also i suggest you to use MergeFactor on writer, this is not necessary but will help on limiting the creation of inverted index files, do it by trial and error method

All of the constructors for IndexWriter have been deprecated in Lucene 3.2 except for the one I am using. I'll check on the IndexSearcher — gmoore, Jun 20 '11 at 00:44
Does anyone know why this construction method for IndexWriter is deprecated — Rob McFeely, Nov 14 '12 at 16:37

score 1 · Answer 2 · answered Jun 19 '11 at 16:47

1

I think we'd need to see your search code to be sure, but I'd suspect that it is a problem with the index searcher. More specifically, make sure that your index reader is being properly closed when you've finished with it.

Good luck,

answered Jun 19 '11 at 16:47

Adrian Conlon

3,941
1
21
17

I'm not closing IndexSearcher because Lucene says that is okay. From their Wiki "Make sure you only open one IndexSearcher, and share it among all of the threads that are doing searches -- this is safe, and it will minimize the number of files that are open concurently." Thanks, though. – gmoore Jun 20 '11 at 00:52

score 0 · Answer 3 · answered Jun 19 '11 at 17:00

The scientific correct answer would be: You can't really tell by this fragment of code.

The more constructive answer would be: You have to make sure that there is only one IndexWriter is writing to the index at any given time and you therefor need some mechanism to make sure of that. So my answer depends of what you want to accomplish:

do you want a deeper understanding of Lucene? or..
do you just want to build and use an index?

If you answer is the latter, you probably want to look at projects like Solr, which hides all the index reading and writing.

I just want to build and use an index and I don't want to use Solr. I'll look into the multiple IndexWriters. — gmoore, Jun 20 '11 at 00:42

score 0 · Answer 4 · edited May 23 '17 at 11:59

0

This question is probably a duplicate of Too many open files Error on Lucene

I am repeating here my answer for that.

Use compound index to reduce file count. When this flag is set, lucene will write a segment as single .cfs file instead of multiple files. This will reduce the number of files significantly.

IndexWriter.setUseCompoundFile(true)

edited May 23 '17 at 11:59

Community

1
1

answered Jun 20 '11 at 10:50

Shashikant Kore

4,952
3
31
40

1

The Lucene FAQ says that setUseCompoundFile is true by default since 1.4 http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_an_IOException_that_says_.22Too_many_open_files.22.3F – gmoore Jun 20 '11 at 14:12

Lucene Java opening too many files. Am I using IndexWriter properly?

4 Answers4