0

So I'm trying to use puppeteer to parse through a bunch of pages. I'm able to do so successfully, but not when I try to do multiple pages at the same time. I understand what's happening - rather than executing a block of code one at a time per row, the code is just hammering the browser with asyncs. My code looks similar to:

const MY_USER = process.env.MY_USER;
const MY_PWD = process.env.MY_PWD;

const puppeteer = require('puppeteer');
const fs = require('fs')
var results = [];

(async => { 
  const browser = puppeteer.launch({
    headless: true,
    ignoreHTTPSErrors: true,
    }); 
    
    const page = await browser.newPage() 
    
    page.setViewport({
      width: 1920,
      height: 2200,
      }); 
    //Log into my site   
    await page.goto('https://example.com',{"waitUntil" : "networkidle0"}}
    await page.type('input[name="username"]', MY_USER); 
    await page.type('input[name="password"]', MY_PWD);
    await page.click('[type="submit"]');
    //Wait for it to load...
    await page.waitForTimeout(1*2000);
    
    //Here is when the problems begin
    fs.readFileSync("myCSV.csv", { 
      encoding: 'utf-8'
      })
      .split('\n')
      .map(async (row) => { 
        await captureMyPage(row[0]);
      })
      
  async function captureMyPage(thisPage)
  {
    await page.goto('https://example.com/'+thisPage, {"waituntil":"networkidle0"});
    await page.click('thisThing')
    await page.click('thisOtherThing')
    await page.click('thisThirdThing')
    
    await page.screenshot({
      path: 'files/'+thisPage+'.jpg',
      fullpage: true,
    });
  
  }
  
 }
 }
 
 {)();

So, the code works if I do it on one page, but what i'm asking is, how do I get

await captureMyPage(row[0])

To wait until that whole function is done executing until it goes back and does it for the same row?

Thanks!

Jan .Jedrasik
  • 63
  • 1
  • 5
  • Does this answer your question? [Crawling multiple URLs in a loop using Puppeteer](https://stackoverflow.com/questions/46293216/crawling-multiple-urls-in-a-loop-using-puppeteer) – ggorlen Mar 04 '21 at 14:46
  • See also [Using async/await with a forEach loop](https://stackoverflow.com/questions/37576685/using-async-await-with-a-foreach-loop) – ggorlen Mar 04 '21 at 14:46

1 Answers1

0

Use for loop instead of map. async/await will not working as your expectation.

const rows = fs.readFileSync("myCSV.csv", { 
  encoding: 'utf-8'
}).split('\n');
for (const row of rows) {
  await captureMyPage(row[0]);
}
hoangdv
  • 15,138
  • 4
  • 27
  • 48