1

Puppeteer cluster closing tabs before I can take screenshot.

I am using puppeteer cluster with maxConcurrency 8. I need to take a screenshot after each page loads[Approx. 20000 urls]. Page.screenshot is not useful for me. My screenshot should include URL bar and desktop. Its basically like a full desktop screenshot. So I am using ImageMagick for taking a screenshot, (and xvfb for multiple screen management)

The problem is:

  1. sometimes, screenshot is taken before switching to the right tab.
  2. blank screenshot, coz current tab is closed, and tab which is not yet loaded came to front.
  3. sometimes, error is thrown as screenshot couldnt be taken, because all the tabs were closed.

What I am doing is: when each page loads, I call page.bringToFront and spawn a child_process, which takes screenshot of the desktop using image magic import command.

cluster.queue(postUrl.href); //for adding urls to queue
await page.waitForNavigation(); // Wait for page to load before screenshot

//taking screenshot
const { spawnSync} = require('child_process');
const child = spawnSync('import', [ '-window', 'root', path]);

Dont want to setup waittime after page load, nodejs ImageMagick didnt work, and promise also didnt seem to work.

I do not want the puppeteer to close tab on its own. Instead, can it give callback event once page is loaded, wait for the callback function to be executed and returned and then the tab is closed??

lorenz
  • 178
  • 2
  • 10

1 Answers1

1

As soon as the Promise of the cluster.task function is resolved, the page will be closed:

await cluster.task(async ({ page, data }) => {
    // when this function is done, the page will be closed
});

To keep the page open you can await another Promise at the end before closing:

await cluster.task(async ({ page, data }) => {
    // ...
    await new Promise(resolve => {
        // more code...
        // call resolve() when you are done
    });
});

Calling the resolve() function at the end will resolve the last Promise and therefore also resolve the whole async function. Therefore, it will close the page. Keep in mind that you want to increase the timeout value to something greater than 30 (default) if necessary when launching the cluster:

const cluster = await Cluster.launch({
    // ...
    timeout: 120000 // 2 minutes
});
Thomas Dondorf
  • 23,416
  • 6
  • 84
  • 105