1

Have looked around extensively but can't find a definitive answer to this: are objects created using Apachie POI HSSF reclaimed by the normal Java gc, or do I need to do something else?

I have a Java program that reads files of test data and writes xls files of data analysis. I've noticed that the working set size (as reported by Process Explorer) gets larger by about 3MB for each file processed, leading me to suspect that (despite explicit calls to gc), the POI objects (cells, rows, sheets) are not being reclaimed, even though there are no lingering pointers to them once each file is written.

I had not anticipated this, so the code currently creates new objects (cells, etc.) every time it needs one. One suggestion I have seen says to create the needed sheets, rows, cells once, then just keep setting new values in them before writing out each xls file. Will this in fact cut down on memory usage or does setting new values into cells also eat memory?

In case it matters, I'm using poi-3.9-20121203

I'm running into a problem because I've now got several thousand files to process and wind up with out of memory errors. (For various reasons it's much easier if I can do them all in one pass, instead of having to do say 500 at a time.)

Many thanks for any recommendations and/or suggestions.

user1359010
  • 243
  • 3
  • 11
  • 2
    One suggestion would be that in Apache POI some of the resources such as CellStyle and Font need to be reused otherwise a new object will be attached every time to the cell (there is no notion of checking whether the given style has been already created). Can you please post your code and describe a bit more the inputs (size, number of rows etc.) to avoid speculation on what could have gone wrong? – Norbert Radyk May 07 '14 at 06:22
  • Are you sure you're removing all references to the file when you're done? No lingering entries in lists or maps? No streams left open? No references passed to other libraries that might cache things? – Gagravarr May 07 '14 at 08:30
  • Norbert: thanks for suggestion. Code is very large (and ugly) so it might not help to post. Re: sheet size -- quite modest: three sheets, two w/200 rows of 5 cols, one with 2000 rows of 3 cols. So ss size likely not the issue. Can you indicate source of claim that CellStyle and Font need to be reused, so I can see what *else* is in that category, and can re-use accordingly? As noted, I am prepared to re-use rather than generate new objects, if I know which have to be handled that way. – user1359010 May 07 '14 at 21:49

0 Answers0