4

I just inherited a PHP project that generates large PDF files and usually chokes after a few thousand pages and several gigs of server memory. The project was using PDFLib to generate these files 'in memory'.

I was tasked with fixing this, so the first thing I did was to send PDFLib output to a file instead of building in memory. The problem is, it still seems to be building PDFs memory. And much of the memory never seems to be returned to the OS. Eventually, the whole things chokes and dies.

When I task the program with building only snippets of the large PDFs, it seems that the data is not fully flushed to the file on end_document(). I get no errors, yet the PDF is not readable and opening it in a hex editor makes it obvious that the stream is incomplete.

I'm hoping that someone has experienced similar difficulties.

Jonathan Hawkes
  • 953
  • 2
  • 12
  • 24
  • Q: Does the server generate PDF documents of several thousand pages in size, or does the server generate several thousand PDF-files? Is the php-program CLI or web-based? Ie how is it started? And how long is it kept in memory? – 0scar Jun 05 '09 at 10:20
  • Many pages, not many PDFs. It's web-based (mod_php) in Apache. Some memory is given back, much is never returned until server reboot. – Jonathan Hawkes Jun 05 '09 at 14:18

2 Answers2

4

Solved! Needed to call PDF_delete_textflow() on each textflow, as they are given document scope and don't go away until the document is closed, which was never since all available memory was exhausted before that point.

Jonathan Hawkes
  • 953
  • 2
  • 12
  • 24
1

You have to make sure that you are closing each page as well as closing the document. This would be done by calling the "end_page_ext" at the end of every written page.

Additionally if you are importing pages from another PDF you have to call "close_pdi_page" after each improted page and "close_pdi_document" when you're done with each imported document.

Lee Hesselden
  • 784
  • 5
  • 11