Converting HTML to PDF for large files using Google Puppeteer

Question

I am getting the following error when I use the puppeteer for generating the PDF file,

Error: Protocol error (Runtime.callFunctionOn): Target closed. at Promise (C:\Users\Rakshith.Shivaram1.MEA\Documents\EY_SARGE_PROJECT\git\26-06-2020\puppeteer-proj\git\html2pdf-puppeteer\node_modules\puppeteer\lib\cjs\puppeteer\common\Connection.js:208:63) at new Promise () at CDPSession.send (C:\Users\Rakshith.Shivaram1.MEA\Documents\EY_SARGE_PROJECT\git\26-06-2020\puppeteer-proj\git\html2pdf-puppeteer\node_modules\puppeteer\lib\cjs\puppeteer\common\Connection.js:207:16) at ExecutionContext._evaluateInternal (C:\Users\Rakshith.Shivaram1.MEA\Documents\EY_SARGE_PROJECT\git\26-06-2020\puppeteer-proj\git\html2pdf-puppeteer\node_modules\puppeteer\lib\cjs\puppeteer\common\ExecutionContext.js:200:50) at ExecutionContext.evaluate (C:\Users\Rakshith.Shivaram1.MEA\Documents\EY_SARGE_PROJECT\git\26-06-2020\puppeteer-proj\git\html2pdf-puppeteer\node_modules\puppeteer\lib\cjs\puppeteer\common\ExecutionContext.js:106:27) at DOMWorld.evaluate (C:\Users\Rakshith.Shivaram1.MEA\Documents\EY_SARGE_PROJECT\git\26-06-2020\puppeteer-proj\git\html2pdf-puppeteer\node_modules\puppeteer\lib\cjs\puppeteer\common\DOMWorld.js:79:24) at process._tickCallback (internal/process/next_tick.js:68:7)

using the below code for launching puppeteer

const browser = await puppeteer.launch({
                        pipe: true,
                        args: [
                            '--headless', '--disable-gpu', '--full-memory-crash-report', '--unlimited-storage',
                            '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'
                        ]
                    })

for setContent with timeouts

await page.setContent(htmContent, { waitUntil: 'networkidle0', timeout: 80000 })

Please let me know is it possible to generate a large PDF file out of an HTML content?

The complete code for converting the Small HTML content to PDF file,

const browser = await puppeteer.launch({
                    // executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',
                    headless: false,
                    pipe: true,
                    args: [
                        '--headless', '--disable-gpu', '--full-memory-crash-report', '--unlimited-storage',
                        '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'
                    ]
                }).catch((el) => {
                    console.log('browser', el);
                    next(createError(el));
                });

                const page = await browser.newPage();
                await page.setDefaultNavigationTimeout(0);
                // await page.waitFor(80000);
                await page.setRequestInterception(true);
                page.on('request', interceptedRequest => {
                    interceptedRequest.continue();
                });
                const version = await page.browser().version();
                console.log('chromium version', version);
                // var buffer = new Buffer(htmContent);
                // var bufferBase64 = buffer.toString('utf-8');
                await page.setContent(htmContent,
                    {
                        waitUntil: 'load',
                        timeout: 0
                    }).catch((ep) => {
                        console.log('setContent', ep);
                        next(createError(ep));
                    });

                // await page.setDefaultTimeout(0);
                await page.waitFor(300000).then(async () => {
                    // page.emulateMediaType('print');
                    // const pdf = await page.pdf({ fullPage: true });
                    console.log('page.waitFor(300000) done');
                    page.on('load', () => console.log('Page loaded!'));
                    const pdf = await page.pdf({ fullPage: true });
                    await page.waitFor(300000).then(() => {
                        console.log('page.waitFor(300000) done');
                        res.set('Content-Type', 'application/pdf');
                        res.header("Access-Control-Allow-Origin", "*");
                        res.send(pdf);
                    });
                });

                // const pdf = await page.pdf({ fullPage: true });
                // res.set('Content-Type', 'application/pdf');
                // res.header("Access-Control-Allow-Origin", "*");
                // res.send(pdf);

                await browser.close().catch((eb) => {
                    console.log('browser.close', ep);
                    next(createError(eb));
                });
            });

The answer to your question "is it possible to generate a large PDF file out of an HTML content?" is yes, it is possible, just it will take some time. As far as I understand you do not have any issues with small HTML content, do you? So you have working code? For larger HTML content you easily may need a few minutes to generate PDF. You need to make sure all the components of your application wait this time, including `setContent`, `pdf`, you may have some server (express or something) or pipe from other sources, etc. — Slava Ivanov, Aug 21 '20 at 14:34
Yes for Small HTML its working fine, but for large file i am not able to add wait till the process completes, setContent, pdf -> server is NodeJS, so how to wait for larger file code here-> await page.setContent(htmContent, { waitUntil: 'networkidle0', timeout: 80000 }) not working await page.pdf({ fullPage: true }); how to add timeouts here — Rakshith Raj S, Aug 21 '20 at 14:43
@Slava lvanov, please let me know about the how to set the time out for 'await page.pdf' since there is no doc available for this in puppeteer — Rakshith Raj S, Aug 21 '20 at 15:09
Well, for `setContent` I believe 80sec is more than enough, almost any size of the HTML can be set during this time. `pdf` doesn't have timeout, it'll process HTML for time it needs, but your application must `await` for this rime. As I mention it may take a few minutes easily. So the problem is not in the component which converts, but in the pieces of code which calls the conversion and have to wait for completion. You would need to debug to get the exact place that terminates the conversion process. The code posted is not enough to tell you something concrete. — Slava Ivanov, Aug 21 '20 at 15:24
`headless: false` set it to true, or remove it; `pipe:true` ... what is this for? What's wrong with socket connection? I am not sure if there is any timeout on the connection over the pipe; Are you using "chrome" or "chromium"? There is commented `executablePath`; should work for both, but better to use internal ("chromium"); You don't need to change `setDefaultNavigationTimeout` to zero, remove it; Why do you `interceptedRequest` if you still continue and do nothing? This needs in case you would abort certain resources, images, etc.? ... continue ... — Slava Ivanov, Aug 21 '20 at 16:41
`setContent` should have descent timeout, not zero milliseconds, especially if your HTML loading external resources, like images, css, etc.; If `setContent` finished properly, you don't need `page.waitFor`, instead you `await page.pdf` after this; — Slava Ivanov, Aug 21 '20 at 16:49
`fullPage: true` is not the option of `pdf`, but `screenshot`, remove it — Slava Ivanov, Aug 21 '20 at 16:53
In the end, all of the above is just the observation of the code provided. As far as I see, `htmlContent` is coming from somewhere (probably request, because you are completing response when trying to send back resulting PDF binary), so you do have server component. This component, as I mention before, must respect the time needed for PDF conversion. Most likely the server timing out request sooner and this is where you need to look to set it right. — Slava Ivanov, Aug 21 '20 at 16:59
Sorry, this takes too much time and the question become too broad and requires debugging the application. I suggest to pay attention to my suggestions for code posted, but most likely it will not fix anything; The problem is with request timeout of your server. Please, take your time and review the configuration of the server you are running. — Slava Ivanov, Aug 21 '20 at 17:05
@Salva Ivanov thank you for the suggestion, meanwhile i am getting the error from the line ''await page.setContent" when trying to set the large html content, Now as per your suggestion i am trying to handle express request and response timeouts for the above converstion api — Rakshith Raj S, Aug 22 '20 at 03:47
Code added for the request timeouts: const timeout = require('connect-timeout'); app.use(timeout(400000)); app.use(haltOnTimedout); function haltOnTimedout(req, res, next) { if (!req.timedout) next(); } — Rakshith Raj S, Aug 22 '20 at 03:54
@Salva Ivanov after so many debugging, i have analysed that setContent is getting failed for larger HTML content, so please help me how to resolve for the larger HTML content, please note i have enabled headless: false for debugging and then noticed that puppeteer is struggling to load the large HTML content — Rakshith Raj S, Aug 22 '20 at 15:57
@Salva Ivanov this issue has been resolved by using page.goto for larger html content, since page.setContent does not working properly for larger file size, all the waitUntil events properly working so there is no crashing of browser during conversion hence this is working, not fine solution but a quick work around though, Thank you — Rakshith Raj S, Aug 23 '20 at 13:24

Converting HTML to PDF for large files using Google Puppeteer

0 Answers0

Linked