PDFBox api is working fine for less number of files. But i need to merge 10000 pdf files into one, and when i pass 10000 files(about 5gb) it's taking 5gb ram and finally goes out of memory. Is there some implementation for such requirement in PDFBox. I tried to tune it for that i used AutoClosedInputStream which gets closed automatically after read, But output is still same.
Asked
Active
Viewed 648 times
1 Answers
1
I have a similar scenario here, but I need to merge only 1000 documents in a single one.
I tried to use PDFMergerUtility
class, but I getting an OutOfMemoryError
. So I did refactored my code to read the document, load the first page (my source documents have one page only), and then merge, instead of using PDFMergerUtility. And now works fine, with no more OutOfMemoryError
.
public void merge(final List<Path> sources, final Path target) {
final int firstPage = 0;
try (PDDocument doc = new PDDocument()) {
for (final Path source : sources) {
try (final PDDocument sdoc = PDDocument.load(source.toFile(), setupTempFileOnly())) {
final PDPage spage = sdoc.getPage(firstPage);
doc.importPage(spage);
}
}
doc.save(target.toAbsolutePath().toString());
} catch (final IOException e) {
throw new IllegalStateException(e);
}
}

Otávio Garcia
- 1,372
- 1
- 15
- 27