0

We have a system in place where we're using NReco.PdfGenerator (version 1.2.0 in .NET 4.8) to generate letters to send to customers. We've got dozens that we're sending a day and we're having trouble with the performance to get things in place.

Right now, we have a single thread that loops through each customer and generates two letters. The original that is mailed and a second with a watermark across the page saying "Copy" (some silly business rules, but that's the way it goes sometimes).

In my development environment, the first letter generates in about 30-40s and the second in less than 5s. That's running as a console application locally. Subsequent letters (for a second customer) take less than 5s each (without the long first try). At least, until I restart the application or a longer period of time passes, at which time I get a long letter again, followed by short ones.

In the test environment, we're running it as a WebJob in Azure and it takes 30-40s for each letter. Every time. Both the original and the copy, for each customer (as well as other PDFs that we're generating as well).

Where's the discrepancy? I've tried stripping the CSS down some (removing border-radius and box-shadow per some other posts), but I'm getting the same results. My first thought was that the library is recreating the wkhtmltopdf files again each and every time I declare a new HtmlToPdfConverter() object (which I do for each time). However, I tried going with a single static object and didn't see any changes (at least, not in development - I didn't update test since it didn't seem to make a difference). Does anyone know what rules and logic the library uses for recreating the directory/files before starting a new generation process? Could this be a caching issue of some kind? Images that are saved in memory somehow that make it faster on subsequent runs? With a variation between debug and release builds?

Other facts that I can think may be relevant. There is no Javascript running on the input content. There are no external files aside of images - no CSS, no fonts, etc. The code that I'm using is basically as follows:

                var pdfGen = new HtmlToPdfConverter();
                
                pdfGen.Size = PageSize.Letter;
                pdfGen.Margins = new PageMargins() { Bottom = 0, Left = 0, Right = 0, Top = 0 };
                pdfGen.CustomWkHtmlArgs = "--dpi 600";

                return pdfGen.GeneratePdf( htmlContent );

And ideas on something that I can try to get this running faster? I'm almost to the point of going to a multi-threaded approach so that I can generate these letters in parallel, but if there's something weird going on with the creation of wkhtmltopdf files, I'm worried about the thread safety of that process. Thoughts?

Edit 1: Here are the logs that I got when I enabled them, along with timestamps. Seems to be a single line ("Counting pages 2/6") that is taking all of the time, but I don't know how to interpret what is actually going on during that time.

14:12:57.941 - [>                                                           ] 0%
14:12:57.942 - [======>                                                     ] 10%
14:12:57.951 - [=======>                                                    ] 13%
14:12:58.005 - Warning: SSL error ignored
14:12:58.006 - Warning: SSL error ignored
14:12:58.006 - Warning: SSL error ignored
14:12:58.007 - Warning: SSL error ignored
14:12:58.007 - Warning: SSL error ignored
14:12:58.008 - Warning: SSL error ignored
14:12:58.016 - [==========>                                                 ] 17%
14:12:58.237 - [==================>                                         ] 30%
14:12:58.311 - [===============================================>            ] 79%
14:12:58.312 - [============================================================] 100%
14:13:34.795 - Counting pages (2/6)                                               
14:13:34.796 - [============================================================] Object 1 of 1
14:13:34.796 - Resolving links (4/6)                                                       
14:13:34.797 - [============================================================] Object 1 of 1
14:13:34.797 - Loading headers and footers (5/6)                                           
14:13:34.798 - Printing pages (6/6)
14:13:34.798 - [>                                                           ] Preparing
14:13:34.798 - [============================================================] Page 1 of 1
14:13:34.958 - Done 

EDIT 2: Some other things that I've tried since last night.

  • I removed URL references to images (my only external resources) and replaced them with base64 representations - no change.
  • I removed the 600dpi setting - no change.
  • I had the URLs wrong in my source HTML and it was causing a redirect to the proper URLs. I changed them to be right the first time - no change.
  • Per KJ's comment, I tried generating this page with the same executable being used by NReco (copy/pasted the parameters from the comment). The first time I did it, it took 38s. I did it again about a minute later and it was instantaneous. So, something of note happening there.

I'll see if I can scrub my document enough to make it safe to post the HTML that I'm using. I will have some issues since I've got some localhost images that are specific to my environment, but I'll see what I can do.

Thanks!

Matt James
  • 109
  • 2
  • 8
  • @k-j I've removed the dpi setting and it doesn't appear to have had an impact. On my page, at least. I'll have to see if "THIS" page makes a difference or not - looking at my page in Chrome has led me to another theory that I'm going to go explore for a few and I'll get back to you. – Matt James Jun 22 '23 at 12:31
  • @KJ i tried copy/paste of your exe/params above to generate this page. per Edit 2 of my original post, it took 38s the first time i did it, and no time/near instant the second time. i just did a third try and it was over 30s again. not sure what to make of that. `wkhtmltopdf.exe --version` returned "wkhtmltopdf 0.12.6 (with patched qt)" from what NReco generated. – Matt James Jun 22 '23 at 12:53
  • i downloaded and installed the 32-bit version of wkhtmltopdf directly from their website and it's binarily the same as what NReco provides. so, it's not that i'm getting a different exe/version. when i ran the above command for this page, it took over 30s to run. so, still an issue from there. – Matt James Jun 22 '23 at 13:11
  • well, not necessarily. when i generated this page, i ran it directly from the command line. no c# or nreco involvement. both with the version of the exe that nreco drops on my machine and with it directly installed from program files. same results from both places. – Matt James Jun 22 '23 at 14:48

1 Answers1

0

Most likely these variable delays in PDF generation are caused by downloading images that are referenced in the HTML template. In the same way as a real web browser wkhtmltopdf needs to download all linked resources (images, css, js) before it can start rendering HTML content into PDF pages.

Simplest way to check if this is the reason is to comment-out these images temporarily (and all external resources, if present) and check how fast PDFs are generated without them. If PDF generation time improves significantly you need to check why accessing external URLs from the server where wkhtmltopdf is executed takes so much time.

Vitaliy Fedorchenko
  • 8,447
  • 3
  • 37
  • 34
  • Thanks. I've tried replacing all of my images with base64 representations directly embedded in the html source and it's not having an impact. There aren't any other external references. With the change, I'm still seeing long load times the same as when they were URLs instead. – Matt James Jun 22 '23 at 11:54
  • @MattJames in that case I can suggest to comment parts of HTML input to determine which piece causes this strange delay in rendering. 30+ seconds for 6 PDF pages is definitely too much and it must be a reason for that. Just to say, NReco wrapper doesn't add any measurable overhead to PDF rendering time in comparing to command line. – Vitaliy Fedorchenko Jun 22 '23 at 19:13