0

CAM::PDF from Chris Dolan has been a phenominal asset for me. Recent project calls for combining more than 1,000 small PDF files into one big file.

All is well until the pages get up to more than 200, at which point it starts to slow down. Eventually, it takes about 30 seconds or more to append each additional file.

I'm using the following code after each append, hoping to clear up cache to speed thing up:

if ($PDF->needsSave()) { $PDF->cleansave() }

I have already reduced each of the small PDF files down to 45kb each.

Short of server upgrades, is there anything else I should do on the coding side to see improvements in speed?

Thanks in advance!

  • I don't know this module, but how about patching together smaller, intermediate files? Say, make 10 files 100 pages each, then merge those? (If the module's problem is simply the document size then this can't help.) – zdim Jul 30 '18 at 19:22
  • Is there anything wrong with generating all the pages individually, and then concatenating them using [Ghoststcript](https://www.ghostscript.com/) or [PDF::Reuse](https://metacpan.org/pod/PDF::Reuse)? See also [this post](https://www.nu42.com/2015/11/combine-tpp-single-document.html) where I compose a single document out of a bunch of pages. – Sinan Ünür Jul 30 '18 at 20:44
  • I also haven't used this module. I assume you had the problem before adding cleansave(), as that seems like it would make it slower, not faster. Also, have you looked at preserveOrder() to prevent some internal sorting? Lastly, can you get it to work with save(), rather than cleansave()? Otherwise, seeing some of your code might help. Also, can you put in some debugging to determine where the slowness is occurring? – UncleCarl Jul 30 '18 at 20:46
  • Not Perl, but might solve your problem: https://www.pdflabs.com/tools/pdftk-server/. If you're running linux, it might already be available from your distribution. – Diab Jerius Aug 06 '18 at 15:03

1 Answers1

1

Chris Dolan here. I've never tried using CAM::PDF at that scale, but I tested with a little 100kb PDF and could not reproduce the slowdown. I tested with this little program:

use warnings;
use strict;
use CAM::PDF;

my $file = shift;
my $in = CAM::PDF->new($file) or die;

for my $i (0..1000) {
   print "$i\n";
   $in->appendPDF(CAM::PDF->new($file));
}

and it took about the same amount of time to append the 1000th file as the first one. Maybe there's some details of your specific PDFs that are causing pathological behavior in the library?? Without more info it's really tough to say.

Maybe the problem is that you're running out of memory and thrashing, but since the PDFs are so small I wouldn't have thought so. I assume you've tried with and without the cleansave()? As @zdmi says, combining them in a binary tree might help speed up some of the early combines but it might still be very slow combining the last few nodes.

Chris Dolan
  • 8,905
  • 2
  • 35
  • 73