12

A big bottleneck I have at the moment is PDF generation each time someone places an order. It's not a big deal for a single order, but when there are a lot in a short time frame, this process is very slow.

The PDF needs text information, a QR code, a Bar code, a logo, and 1 or more (up to 20+) 1/4-width images.

Current process w/ DOMPDF:

  1. QR code image created w/ PHP and saved as png
  2. Bar code image created and saved as png
  3. DomPDF generates PDF

New thought:

  1. HTML2PDF creates PDF, and uses it's qr and bar code tags to generate the bar codes

That theoretically would take care of the QR and Barcode images, but still, the rest of the images make it too slow.

Doing it this way, without any images other than the (QR and Bar code), the PDF can generate in ~500ms, but as soon as I start adding images, it goes up to 2, 3, 4, 5+ seconds each.


When running tests, and processing ~10k orders (in a few minutes), it was still processing the PDFs around 12 hours later until I just shut it down in frustration.

The PDF is generated in a separate queue process, so the person doesn't have to wait when ordering, but - still... it can't take 5+ hours for them to receive their confirmation PDF during high traffic.


Questions / TLDR:

How can I make my process of creating PDFs with a dynamic qr code, a dynamic bar code, dynamic text, and 1-20 static images (images are same across all PDFs) faster?

Are there other potential things I haven't thought of? Maybe making a template PDF and somehow use PHP to just fill in the dynamic spots?

Dave
  • 28,833
  • 23
  • 113
  • 183
  • DOMPDF is using GD? By a guess I think imagick could be faster. However, processing huge pdfs with lots of images in php is slow and memory consuming. We've implemented a solution that is generating latex files and then printing them to pdf. You could try that as well. For the image processing you could try to use imagick instead of GD. The question *is* to broad, you'll have to try and benchmark different solutions. I doubt it is possible to give a "right" answer. It might be a good idea to profile your current code as well to see where exactly it is slow. – floriank Sep 16 '14 at 02:08
  • Any suggestions on HOW to be more specific are certainly welcome. I'm not looking for a "you're missing a semi-colon" type answer for sure, but - I've described the process and problem in detail and would love any suggestions, ideas...etc. I'm at a loss and have expended all my other resources. Not sure where else to go other than asking here. :( There must be some way to create PDFs with images faster. Both Html2PDF and DomPDF are slow as snails the minute I start adding images. I thought maybe the template idea had some merit, but - can't find anything there either. – Dave Sep 16 '14 at 02:14
  • Profile your script and see where it is slow, I bet on the image manipulation part of it. Then try to improve it. That's the best I could think of right now. You can try the latex solution as well, it is not that hard to implement. Example: https://www.sharelatex.com/learn/Inserting_Images However I've generated a 21 page repair report with lots of photo documentation 7 years ago and it was working fine at this time with TCPDF (or something else) so it should be possible to do that today as well. But I haven't had to process 10k of reports a day. – floriank Sep 16 '14 at 02:16
  • 3
    2 close votes - REALLY? I don't get it. This is a very valid question for a very valid (and likely common enough) problem. It might not be an easy answer, and it might not be a black & white answer, but geez - just because a question can't be answered in 10 seconds by someone doesn't mean it's invalid or doesn't belong on SO. There's even a "performance" tag with 10k+ uses. If not asked here, and not something that can be found on Google, then what else is there? THIS seems like the PERFECT place for a question like this. – Dave Sep 16 '14 at 02:23
  • @burzum - I'll try looking into the latex thing - I have never heard of that nor understand what it is exactly, but I appreciate the link/info. It's not going to be easy IMO to profile my code, as it's not MY code that's causing it to run slow - it's something in both PDF libraries. – Dave Sep 16 '14 at 02:24
  • Why is it not easy? Use xdebug and http://kcachegrind.sourceforge.net/html/Home.html The graphical representation and sorting makes it dead easy to spot slow parts. Latex is commonly used to write scientific publications. It is solid and accurate. Think of it as HTML for text documents. It's not really hard. If you ever dealt with Word and Open Office documents as templates to generate Pdfs from, especially via php you'll appreciate it. – floriank Sep 16 '14 at 02:53
  • I did not down vote but it is the type of question that elicits opinion. So ... because of opinion .. IMHO XSL FO was meant for such things. I have done installations much larger and faster than this with much more complex documents. – Kevin Brown Sep 16 '14 at 05:01
  • This question being closed is ridiculous. – Dave Sep 16 '14 at 11:55
  • 1
    Have you tried TCPDF? This works very fast with images. Or I would suggest you [WKHTMLTOPDF](http://wkhtmltopdf.org/) library. You can make your Data Pulling and processing faster using some caching mechanism. – Anand G Sep 16 '14 at 12:29
  • @AnandGhaywankar - thanks, will look into those. data pulling/processing is all taken care of - just the generation that's the issue now. Thanks for info. – Dave Sep 16 '14 at 13:04
  • @AnandGhaywankar - The down-side with other libraries like TCPDF and WK.. is it doesn't appear to create QR and Bar codes, which seems like something that would lower my time pretty dramatically. – Dave Sep 16 '14 at 13:52
  • 1
    TCPDF generates barcodes and qr codes http://www.tcpdf.org/doc/code/classTCPDFBarcode.html, http://www.tcpdf.org/doc/code/classQRcode.html – bancer Sep 16 '14 at 19:43
  • 1
    Regarding closing the question - I feel the smell of politics here. I can't agree that this question cannot be answered in a few paragraphs. – bancer Sep 16 '14 at 19:50
  • You don't say how the PDF's are used, but ... if they are being used in a closed environment, OPI is exactly what this kind of problem was invented for. All you need then is an application that can generate PDF's with OPI and the other features you require. I thought it worth throwing it in the pot, just in case it's of help. [OPI is a feature common in printing where a low resolution proxy image is placed in the PDF, while the high res image is on a server that is accessible to the final printing process.] – John Jefferies Sep 16 '14 at 20:34
  • @bancer - 1) awesome about TCPDF making bar and qr codes! That, along w/ the recommendation that it "works very fast with images" means I think that's my next attempt! Thanks/I agree about the close comment, but - appears it's been re-opened (yay!) – Dave Sep 16 '14 at 20:38
  • Thanks, @JohnJefferies. It's not, but that's good info to anyway. – Dave Sep 16 '14 at 20:39
  • 1
    I know this is an old question but I have tested many things with DOMPDF and I have read so many posts about slow processing of images and I have found out that using png of gif images in Dompdf makes the processing of the file very slow. If you don't need to have transparency (alpha) on images, use only jpg images and you'll see that generation of PDF will be much faster with dompdf. – otinanai May 21 '15 at 00:40

2 Answers2

5

I would strongly advice you to use TCPDF library. It's quite fast and can be easily integrated into CakePHP. You can find a lot of examples of how to include images, barcodes and QR codes into PDF at TCPDF examples page.

To further improve the performance use tips from this page:

  • Install and configure a PHP opcode cacher like XCache;
  • Edit the php.ini file and increase the maximum amount of memory a script may consume (memory_limit);
  • Edit the php.ini file and increase the maximum execution time of each script (max_execution_time);
  • Edit the config/tcpdf_config.php file: manually set the $_SERVER['DOCUMENT_ROOT'], K_PATH_MAIN and K_PATH_URL constants, and remove the automatic calculation part;
  • If you are not using the Thai language, edit the config/tcpdf_config.php file and set the K_THAI_TOPCHARS constant to false;
  • If you do not need extended chars, edit the config/tcpdf_config.php file and set the default fonts to core fonts;
  • If you do not need UTF-8 Unicode, set the $unicode parameter on TCPDF constructor to false and the $encoding parameter to 'ISO-8859-1' or other character map.
  • By default TCPDF enables font subsetting to reduce the size of embedded Unicode TTF fonts, this process, that is very slow and requires a lot of memory, can be turned off using setFontSubsetting(false) method;
  • Use core fonts instead of embedded fonts whenever possible;
  • Avoid using the HTML syntax (writeHTML and writeHTMLCell methods) if not strictly required;
  • Split large HTML blocks in smaller pieces;
  • Avoid using transactions if not strictly required;
  • Restart the webserver after changes.

If that does not improve the performance to the acceptable level you can install your CakePHP application (or just the script that runs the generation of PDFs if it doesn't use CakePHP) on a second server with more available resources and use that server only for PDF generation.

bancer
  • 7,475
  • 7
  • 39
  • 58
  • 1
    Thanks very much for the detail. I've heard a few people mention it's "fast", but have you found any data to back that up? Will try TCPDF asap and mark as answer if it works! :) – Dave Sep 16 '14 at 22:38
4

You can try to use JPEG instead of PNG files if you don't need transparency.

For example, in TCPDF, I had to generate a PDF with a big PNG in background (18cm x 18cm, 300dpi). I had to wait for 11 seconds before the file is generated. I replaced the image with a JPEG of the same size and DPI, and it took less than 1 second.

Yorick
  • 83
  • 7
  • 1
    i can confirm this. previously, using PNG takes 2+ seconds, after switching to JPG, it went down to 0.1+ ms. not bad for a quick fix.. – RZKY Nov 14 '20 at 14:29