30

Is it safe/supported to run multiple instances of Puppeteer at the same time, either at

  1. the process level (multiple node screenshot.js at the same time) or
  2. at the script level (multiple puppeteer.launch() at the same time)?

What are the recommended settings/limits on parallel processes?

(In my tests, (1) seems to work fine, but I'm wondering about the reliability of Puppeteer's interactions with the single (?) instance of Chrome. I haven't tried (2) but that seems less likely to work out.)

Bhoomtawath Plinsut
  • 1,337
  • 1
  • 16
  • 31
mjs
  • 63,493
  • 27
  • 91
  • 122

3 Answers3

48

It's fine to run multiple browser, contexts or even pages in parallel. The limits depend on your network/disk/memory and task setup.

I crawled a few million pages and from time to time (in my setup, every ~10,000 pages) puppeteer will crash. Therefore, you should have a way to auto-restart the browser and retry the job.

You might want to check out puppeteer-cluster, which takes care of pooling the browser instances, restarting and crash detection/restarting. (Disclaimer: I'm the author)

An example of a creation of a cluster is below:

// create a cluster that handles 10 parallel browsers
const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_BROWSER,
    maxConcurrency: 10,
});

// Queue your jobs (one example)
cluster.queue(async ({ page }) => {
    await page.goto('http://www.wikipedia.org');
    await page.screenshot({path: 'wikipedia.png'});
});

This is just a minimal example. There are many more ways to use the cluster.

syntagma
  • 23,346
  • 16
  • 78
  • 134
Thomas Dondorf
  • 23,416
  • 6
  • 84
  • 105
  • Thank you for your package. One of the best I have ever seen. For people who want to auto-restart the browser and retry the job, they can use pm2. – Julien Le Coupanec May 05 '20 at 14:39
  • 3
    @Thomas ever encounter a situation where `.type("text")` would get jumbled up between different instances? I had a couple windows open (not using puppeteer-cluster) and it appeared when I would send .type commands, characters would get mixed up between the windows (parts of the string intended for window 1 would get typed into window 2, etc.). Aware of this? Any trick to avoid this issue? Does your library handle this case? – Arash Motamedi May 12 '20 at 21:59
  • @AryehArmon I ended up having to "single instance" my application and send type commands to one browser/page/textbox at a time. I'm not sure how puppeteer implements the `type` API, but it looks like it basically sends keyboard events to the OS and the OS determines which "window" and "textbox" has focus and sends the key events to that input. So, yeah, I couldn't type into multiple inputs at the same time, I had to limit my app to send key events to only 1 input box at a time. – Arash Motamedi Sep 14 '21 at 16:09
  • Being single-threaded doesn't mean javascript can't interact with multiple concurrent processes running outside of its context. Asynchronous programming allows that single thread to jump around and serve many tasks while others are waiting. Like a server at a restaurant serving several tables. – Adam Tolley Oct 23 '21 at 01:44
12

Each puppeteer.launch() boots a new browser for your script to drive, so it's better to have a script interact with multiple puppeteer.launch calls versus running multiple instances of your script. Even though node is single-threaded, events are sent through WebSockets to the browser, meaning you're benefiting from node's async behavior. Said another way: none of these processes run in serial and instead run in parallel even given the single-threaded nature.

For some background I run a service called browserless (https://browserless.io) that aims to productionalize web-based work. I also maintain a few images on docker here: https://hub.docker.com/r/browserless/chrome/

browserless
  • 2,090
  • 16
  • 16
-2

Both will work but second one doesn't really make a sense. Reason for that is because nodeJS i single threaded. So even if process will work it won't be faster of easier to use multiple instances of browser in one process rather than in multiple processes. Best option is to run (1) as you did before, only thing you need to remember is to keep tests self contained.

piro
  • 176
  • 1
  • 11
  • 1
    I don't follow why the second doesn't make sense. Why couldn't multiple simultaneous `puppeteer.launch()` work in the same way as multiple simultaneous `fetch()`? – mjs Jan 23 '18 at 17:06
  • When nodeJS communicate with outside world (browser) it can do only one task at the time. So even if you open multiple chromes you won't be able to receive data from more than one. Your script will work correctly but there won't be any performance difference form when only one browser is running (there might be a difference if you have very slow website and you you'll switch between two browser instances). `fetch` is working in the same way btw, you cannot process two responses at "exactly" the same time, so if you receive two responses at the same ms you'll process them one after another. – piro Jan 24 '18 at 17:21