We have a system in place where we're using NReco.PdfGenerator (version 1.2.0 in .NET 4.8) to generate letters to send to customers. We've got dozens that we're sending a day and we're having trouble with the performance to get things in place.
Right now, we have a single thread that loops through each customer and generates two letters. The original that is mailed and a second with a watermark across the page saying "Copy" (some silly business rules, but that's the way it goes sometimes).
In my development environment, the first letter generates in about 30-40s and the second in less than 5s. That's running as a console application locally. Subsequent letters (for a second customer) take less than 5s each (without the long first try). At least, until I restart the application or a longer period of time passes, at which time I get a long letter again, followed by short ones.
In the test environment, we're running it as a WebJob in Azure and it takes 30-40s for each letter. Every time. Both the original and the copy, for each customer (as well as other PDFs that we're generating as well).
Where's the discrepancy? I've tried stripping the CSS down some (removing border-radius and box-shadow per some other posts), but I'm getting the same results. My first thought was that the library is recreating the wkhtmltopdf files again each and every time I declare a new HtmlToPdfConverter() object (which I do for each time). However, I tried going with a single static object and didn't see any changes (at least, not in development - I didn't update test since it didn't seem to make a difference). Does anyone know what rules and logic the library uses for recreating the directory/files before starting a new generation process? Could this be a caching issue of some kind? Images that are saved in memory somehow that make it faster on subsequent runs? With a variation between debug and release builds?
Other facts that I can think may be relevant. There is no Javascript running on the input content. There are no external files aside of images - no CSS, no fonts, etc. The code that I'm using is basically as follows:
var pdfGen = new HtmlToPdfConverter();
pdfGen.Size = PageSize.Letter;
pdfGen.Margins = new PageMargins() { Bottom = 0, Left = 0, Right = 0, Top = 0 };
pdfGen.CustomWkHtmlArgs = "--dpi 600";
return pdfGen.GeneratePdf( htmlContent );
And ideas on something that I can try to get this running faster? I'm almost to the point of going to a multi-threaded approach so that I can generate these letters in parallel, but if there's something weird going on with the creation of wkhtmltopdf files, I'm worried about the thread safety of that process. Thoughts?
Edit 1: Here are the logs that I got when I enabled them, along with timestamps. Seems to be a single line ("Counting pages 2/6") that is taking all of the time, but I don't know how to interpret what is actually going on during that time.
14:12:57.941 - [> ] 0%
14:12:57.942 - [======> ] 10%
14:12:57.951 - [=======> ] 13%
14:12:58.005 - Warning: SSL error ignored
14:12:58.006 - Warning: SSL error ignored
14:12:58.006 - Warning: SSL error ignored
14:12:58.007 - Warning: SSL error ignored
14:12:58.007 - Warning: SSL error ignored
14:12:58.008 - Warning: SSL error ignored
14:12:58.016 - [==========> ] 17%
14:12:58.237 - [==================> ] 30%
14:12:58.311 - [===============================================> ] 79%
14:12:58.312 - [============================================================] 100%
14:13:34.795 - Counting pages (2/6)
14:13:34.796 - [============================================================] Object 1 of 1
14:13:34.796 - Resolving links (4/6)
14:13:34.797 - [============================================================] Object 1 of 1
14:13:34.797 - Loading headers and footers (5/6)
14:13:34.798 - Printing pages (6/6)
14:13:34.798 - [> ] Preparing
14:13:34.798 - [============================================================] Page 1 of 1
14:13:34.958 - Done
EDIT 2: Some other things that I've tried since last night.
- I removed URL references to images (my only external resources) and replaced them with base64 representations - no change.
- I removed the 600dpi setting - no change.
- I had the URLs wrong in my source HTML and it was causing a redirect to the proper URLs. I changed them to be right the first time - no change.
- Per KJ's comment, I tried generating this page with the same executable being used by NReco (copy/pasted the parameters from the comment). The first time I did it, it took 38s. I did it again about a minute later and it was instantaneous. So, something of note happening there.
I'll see if I can scrub my document enough to make it safe to post the HTML that I'm using. I will have some issues since I've got some localhost images that are specific to my environment, but I'll see what I can do.
Thanks!