create new tab in puppeteer inside a loop cause Navigation timeout

Question

Recently I am learning puppeteer using their docs and try to scrape some information.

First approach

First I collect a list of url from the mainpage. Second I create a new tab and go those url iterately and collect some data. I doubt when I enter the loop the new tab didn't work as I expect and freezed without giving any data. Eventually I got a error TimeoutError: Navigation timeout of 30000 ms exceeded. Is there any better approach?

(async () => {
  const browser = await puppeteer.launch({ headless: true });
  const mainpage = await browser.newPage();

  console.log('goto main page'.green);
  await mainpage.goto(mainURL);

  console.log('collecting some url'.green);
  const URLS = await mainpage.evaluate(() =>
    Array.from(
      document.querySelectorAll('.result-actions a'),
      (element) => element.href
    )
  );
  if (typeof URLS[0] === 'string') console.log('OK'.green);

  console.log('collecting finished'.green);

  const newTab= await browser.newPage();

  console.log('create new tab'.green);

  var data = [];

  for (let i = 0, n = URLS.length; i < n; i++) {
    //console.log(URLS[i]);

    // use this new tab to collect some data then close this tab
    // continue this process

    await newTab.waitForNavigation();
    await newTab.goto(URLS[i]);
    await newTab.waitForSelector('.profile-phone-column span a');
    console.log('Go each url using new tab'.green);

    // collecting data
    
    data.push(collected_data);
    // close this tab
    await collectNamePage.close();
    console.log(data);
  }
  await mainpage.close();
  await browser.close();
  console.log('closing browser'.green);
})();

Second approach

This time I want to skip the part where I collect those data using a new tab. Hence I collect my urls using page.$$() and try to iterating using for...of over urls and collect my data using elementHandle.$(selector) but this approach also failed.

I am getting frustrated. Am I doing it wrong way or I didn't understand their documentation?

vsemozhebuty · Accepted Answer · 2020-07-30T17:24:52.917

1

In your script, you do not need newTab.waitForNavigation(); at all. Usually, this is used when the navigation is caused by some event. When you just use .goto(), the page loading is waited automatically.
Even if you need waitForNavigation(), you usually do not await it before the navigation triggered, otherwise you just get the timeout. You await it with navigation trigger together:
```
await Promise.all([element.click(),  page.waitForNavigation()]);
```

So try to just delete await newTab.waitForNavigation();.

Also, do not close the new tab in the loop, delete it after the loop.

Edited script:

const puppeteer = require('puppeteer');
const mainURL = 'https://www.psychologytoday.com/us/therapists/illinois/';

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const mainpage = await browser.newPage();

  console.log('goto main page');
  await mainpage.goto(mainURL);

  console.log('collecting urls');
  const URLS = await mainpage.evaluate(() =>
    Array.from(
      document.querySelectorAll('.result-actions a'),
      (element) => element.href
    )
  );
  if (typeof URLS[0] === 'string') console.log('OK');
  console.log('collection finished');

  const collectNamePage = await browser.newPage();

  console.log('create new tab');

  var data = [];

  for (let i = 0, totalUrls = URLS.length; i < totalUrls; i++) {
    console.log(URLS[i]);

    await collectNamePage.goto(URLS[i]);
    await collectNamePage.waitForSelector('.profile-phone-column span a');
    console.log('create new tab and go there');

    // collecting data
    const [name, phone] = await collectNamePage.evaluate(
      () => [
        document.querySelector('.profile-middle .name-title-column h1').innerText,
        document.querySelector('.profile-phone-column span a').innerText
      ]
    );
    data.push({ name, phone });
  }

  console.log(data);
  await collectNamePage.close();

  await mainpage.close();
  await browser.close();
  console.log('closing browser');
})();

edited Jul 30 '20 at 17:24

answered Jul 30 '20 at 16:37

vsemozhebuty

12,992
1
26
26

thanks for your response @vsemozhebuty. Now I get new Error `Execution context was destroyed, most likely because of a navigation.` [here](https://pastebin.com/SWQGEHZ6) is the full console log. I guess this problem occur when you are trying to run a function, but the target (tab) was already closed. Any suggestion? And why `await newTab.$('selector_thing').innerText;` is seem undefined? Isn't it should be the desire text as I can clearly know that node exit – NJN Jul 30 '20 at 16:53
Unfortunately, it is hard to tell until you provide the real code (currently your example has omissions and editing artifacts — for example, what is `collectNamePage`?). – vsemozhebuty Jul 30 '20 at 17:02
As for `await newTab.$('selector_thing').innerText;` - you should not confuse traditional DOM elements and their API (`.innerText`) with element handlers that are returned by many puppeteer functions. See for details: https://stackoverflow.com/questions/55388455/get-href-attribute-in-pupeteer-node-js/55391319#55391319 – vsemozhebuty Jul 30 '20 at 17:05
Then [here](https://pastebin.com/saemnJUu) is the full code @vsemozhebuty. It will be great help if you help me to resolve that issue – NJN Jul 30 '20 at 17:12
Yes finally get those data. Thanks to clear me between ElementHandles vs traditional DOM element. But still have the `Execution context was destroyed` Error – NJN Jul 30 '20 at 17:23
I've edited your script a bit: just delete `colors` package code and change the data collection way (see in the answer). But I cannot reproduce the errors, sorry, For me, everything works fine. – vsemozhebuty Jul 30 '20 at 17:31
1

It's Ok @vsemozhebuty I resolve that one also sorry to ask too many things. I accepted your answer. Cheers – NJN Jul 30 '20 at 17:44

create new tab in puppeteer inside a loop cause Navigation timeout

First approach

Second approach

1 Answers1