Questions tagged [puppeteer-cluster]

puppeteer-cluster manages a pool of headless browsers via puppeteer. This is useful to crawl multiple pages in parallel or to keep a pool of open browsers.

puppeteer-cluster creates a pool of puppeteer workers by spawning multiple browsers, contexts or pages via puppeteer. The library keeps track of queued jobs and handles thrown errors. In addition, it allows to retry jobs or introduce delays when crawling a domain.

Resources:

73 questions
1
vote
0 answers

Puppeteer-cluster: many screenshots best practice

I'm using Puppeteer-cluster to process a high volume of screenshots directly from various HTML strings, and the response should be ASAP in a few milliseconds. Since opening and closing a browser for each screenshot is not efficient, we want to use a…
TBE
  • 1,002
  • 1
  • 11
  • 32
1
vote
1 answer

Get result from listener async

I use puppeteer-cluster + node js. I am new in that. I have some trouble. I need to get XHR response from site.I am listening to the page, but I cannot write the resulting value to the variable. I need to use the value in another part of the…
1
vote
0 answers

puppeteer cluster _ how to prevent close page?

I am glad to find the puppeteer cluster. this library made life easy on crawling and automation tasks.tnx to Thomas Dondorf. according to the author of the puppeteer cluster, when a task finished page will be closed immediately.this is good by the…
Babak Abadkheir
  • 2,222
  • 1
  • 19
  • 46
1
vote
0 answers

puppeteer cluster _ no sand box option is not working on launch

this is my config on the puppeteer cluster : const cluster = await Cluster.launch({ concurrency: Cluster.CONCURRENCY_CONTEXT, workerCreationDelay: 2000, puppeteerOptions:{args: ['--no-sandbox', '--disable-setuid-sandbox']}, maxConcurrency:…
Babak Abadkheir
  • 2,222
  • 1
  • 19
  • 46
1
vote
0 answers

Puppeteer-Cluster Not using nodeJS workers

I'm using puppeteer-cluster in multi worker mode in nodeJS, for some reason only one worker is opening the number of concurrency browsers that I defined. The other are ignored. what am I doing wrong? Basically i start a cluster with 2 browser…
talkl
  • 11
  • 5
1
vote
0 answers

How to modularize puppeteer-cluster code?

I'm pretty new to this module. I heard it's better than just using puppeteer because it can run tasks in parallel. Anyway, I need help modularizing my code. Here's the example program in the npm page: const { Cluster } =…
1
vote
0 answers

Best way of running a cron job with puppeteer-cluster in a cron meant for load testing

I am doing load testing i don't want the server to crash which it was doing earlier when i was launching separate puppeteer instances and was trying to run two queries each of them fetching 100mb of data from mysql db, when i ran single puppeteer it…
1
vote
0 answers

Puppeteer cluster sometimes cannot start in k8s

I'm using puppeteer with puppeteer-cluster, this is deployed on k8s and everything works great. The only problem I'm having is that sometimes the pod won't start, and throws this exception: (node:24) UnhandledPromiseRejectionWarning: Error: Unable…
orirab
  • 2,915
  • 1
  • 24
  • 48
1
vote
1 answer

puppeteer-cluster: Setting a timeout on individual execution tasks

I'm trying to get individual tasks to throw a time-out during stress testing to see what my calling program will do. However, my cluster keeps tasks fresh indefinitely. It appears to queue all my cluster.execute calls which then are kept in memory…
G_V
  • 2,396
  • 29
  • 44
1
vote
1 answer

Is there a way to override "tab closing" in puppeteer cluster?

Puppeteer cluster closing tabs before I can take screenshot. I am using puppeteer cluster with maxConcurrency 8. I need to take a screenshot after each page loads[Approx. 20000 urls]. Page.screenshot is not useful for me. My screenshot should…
lorenz
  • 178
  • 2
  • 10
1
vote
1 answer

How to pull data from PostgreSQL, process, then store in javascript?

I'm not too familiar with advanced javascript and looking for some guidance. I'm looking to store webpage content into DB using puppeteer-cluster Here's a starting example: const { Cluster } = require('puppeteer-cluster'); (async () => { const…
sojim2
  • 1,245
  • 2
  • 15
  • 38
0
votes
0 answers

Passing data from Puppeteer to Vue JS Component

The data flow of my app begins with a backend API request which launches a Vue component using puppeteer. Is there any way I can pass that data from Backend (express) to the vue component which is launched other than making the Vue component call a…
0
votes
0 answers

Try-catch not working as expected in Puppeteer-Cluster (JavaScript)

I have a simple Puppeteer-Cluster code written in NodeJS: require("dotenv").config(); (async () => { // Create a cluster with 2 workers const cluster = await Cluster.launch({ concurrency: Cluster.CONCURRENCY_PAGE, …
0
votes
0 answers

How to know when puppeteer-cluster initializes a new worker

I have a gated website that I am scanning with puppeteer-cluster. I have a maximum concurrency of 5 with context_browser to share session information across tabs. This works great for the first 5 scans but once the worker dies and a new one is…
Wayne F. Kaskie
  • 3,257
  • 5
  • 33
  • 43
0
votes
1 answer

puppeteer-cluster seems to act in serial instead of parallel

I made an cluster of puppeteer workers using puppeteer cluster, const cluster = await Cluster.launch({ concurrency: Cluster.CONCURRENCY_PAGE, puppeteerOptions: { userDataDir: path.join(__dirname,'user_data/1'), headless:…
DarkZeus
  • 61
  • 8