Questions tagged [puppeteer-cluster]

puppeteer-cluster manages a pool of headless browsers via puppeteer. This is useful to crawl multiple pages in parallel or to keep a pool of open browsers.

puppeteer-cluster creates a pool of puppeteer workers by spawning multiple browsers, contexts or pages via puppeteer. The library keeps track of queued jobs and handles thrown errors. In addition, it allows to retry jobs or introduce delays when crawling a domain.

Resources:

73 questions
0
votes
1 answer

Puppeteer-Cluster consistently using only half of my cores

I'm running a pretty standard puppeteer cluster job, with the following settings: const cluster = await Cluster.launch({ concurrency: Cluster.CONCURRENCY_PAGE, // maximize sharing data between jobs maxConcurrency: 8, monitor:…
0
votes
1 answer

puppeteer-cluster, different data to the same url

i put an example below that i want to add different search inputs (firstWord + scndWord) from array of object to two google pages in the same time, so opening pages dynamically depend on the array length 1st page open google then write red…
med amine
  • 1
  • 2
0
votes
1 answer

Puppeteer-cluster with cheerio in express router API returns empty response

I'm writing an API with express, puppeteer-cluster and cheerio that returns all anchor elements containing one or more words that can be added as query parameters. I want to use puppeteer in order to get elements that are javascript generated too.…
Bella
  • 414
  • 2
  • 13
0
votes
0 answers

how to trigger chrome extension function from puppeteer Cluster

I need to trigger a function in background.js in extension from puppeteer Cluster here is my code : const wait = (ms) => new Promise(resolve => setTimeout(resolve, ms)); (async () => { const puppeteer = addExtra(puppeteerStream); const…
0
votes
0 answers

Puppeteer is detected on ubuntu server but not locally

so I have a puppeteer script to watch TikTok live streams and when I run it locally it works as expected, but in Ubuntu 20.04 LTS Server the page loads for the live stream, but the live stream never starts and it requires me to log in, which doesn't…
0
votes
0 answers

Puppeteer Enable Third Party Cookies For Incognito

Is there a way to enable/allow thrid party cookies through puppeteer or node program, i am using puppeteer-cluster also. I have tried with chrome-profile solution but that is not working.
0
votes
0 answers

How handle multiple functions in puppeteer-cluster?

I have a two step program : Get a list of href from a page Loop infinitely on each page of this list, get an element and display it in console I try to use function with Puppeteer-Cluter but it doesn't work properly. const { Cluster } =…
user2178964
  • 124
  • 6
  • 16
  • 40
0
votes
1 answer

How to interrupt puppeteer-cluster execution inside an infinite loop?

I'm learning how to use Puppeteer cluster and I have a question. How can I interrupt a puppeteer cluster execution running in an infinite loop, by using a key press? The code would be something like this: const { Cluster } =…
igortorati
  • 23
  • 3
0
votes
0 answers

MySQL Stream and piping data

I am using MySQL to get urls I have for a web scraper. Currently I get the data using csv-stringify as shown below: conn.query('SELECT id, URL FROM company_table') .stream() .pipe(stringifier).pipe(process.stdout); I see the data in the…
Tom
  • 251
  • 1
  • 6
  • 16
0
votes
2 answers

Puppeteer error while running in ubuntu machine

when I run puppeteer on Ubuntu I get this error: UnhandledPromiseRejectionWarning: Error: Unable to launch browser, error message: Failed to launch the browser process! [2098647:2098647:0520/162023.317120:ERROR:vaapi_wrapper.cc(594)] Could not get a…
Mike
  • 43
  • 1
  • 7
0
votes
1 answer

Problem getting puppeteer-cluster waiting on page event before closing

I'm currently setting up a CI environment to automate e2e tests our team runs in a test harness. I am setting this up on Gitlab and currently using Puppeteer. I have an event that fires from our test harness that designates when the test is…
xtr33me
  • 936
  • 1
  • 13
  • 39
0
votes
1 answer

Puppeteer cluster.close() "crashes" after calling cluster.queue()

Long story short, I've made an app for web scraping and in order for it to be able to simultaneously run more then 1 process at a time (more than 1 Chromium opened), i used puppeteer-cluster. I've got it to run several processes at once, but the…
0
votes
1 answer

Infinite loop (on purpose) using puppeteer cluster

I am very new to puppeteer-cluster. My goal is to scrape a list of 100 sites infinitely, so once I get to the 100th link, script would start over again (Ideally reusing the same cluster instance). Is there a better way, or proper way to do this? I…
chroman
  • 1,534
  • 12
  • 18
0
votes
1 answer

How to target multiple identical "input[type="file"]" in Pupeteer?

I have a page in pupeteer in which I'm trying to initiate a file upload and it has to uploading buttons. the problem is both buttons that initiate the file upload have the same type and selector. This working code allows me to upload a file to the…
0
votes
2 answers

Looping through multiple links properly

I am very new to puppeteer. I started yesterday and I'm trying to make a program that flips through a url that incrementally stores player id's one after the other and saves the player stats using neDB. There are thousands of links to flip through…