9

Here is my code:

// Open the browser
let browser = await puppeteer.launch({
    args: ["--no-sandbox"]
});
let page = await browser.newPage();

navPromise = page.waitForSelector('#js_boite_reception').then(() => {
    console.log('received');
});
await page.goto(entMessagesURL);
await navPromise;

// Wait 10 seconds, to be sure that is not because my connection is slow (it's not)
logger.log(`On the messages page (session=${username})`);
await delay(10000);

// Write an html file with the page content
let pageContent = await page.content();
require('fs').writeFileSync('./test.html', pageContent);

The received is not displayed and I'm getting a timeout error. But, if I remove the waitForSelector function, and I only write the test.html file, we can see that:

Headless mode enabled, a part of the page is not loaded

headless mode en

Headless mode disabled, all the page is loaded

Headless mode dis

With headless mode, only a part of the page content is loaded. I don't know why. Even if I add a timeout of one minute, it won't load more... What can I do?

Note: I tried with a useragent:

await page.setUserAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36");

(under the let page = await browser.newPage())

Androz2091
  • 2,931
  • 1
  • 9
  • 26
  • 4
    if your trying for a live website , in the headless mode you dont have some headers in your request (user agent most notably ) so some websites would block the request as they figure out its a bot sending the request ... so try adding user agent header to your page before sending the request `page.setUserAgent` .... if this is a website on your localhost you may need to check the console for some js error – hretic Jan 28 '20 at 12:02
  • I have the same problem, even with a useragent... – Androz2091 Jan 29 '20 at 15:16
  • Try check console logs. See https://github.com/puppeteer/puppeteer#debugging-tips – Orkhan Alikhanov Jan 30 '20 at 06:32
  • There is nothing in the console, and adding a slowmode doesn't fix anything... :\ – Androz2091 Jan 30 '20 at 06:41
  • some websites do more than just checking user-agents to detect headless mode. have you tried https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth#readme? – mbit Jan 30 '20 at 08:04
  • No, it doesn't change anything. But I think the website is not blocking me, but the page doesn't want to load totally in headless mode – Androz2091 Jan 30 '20 at 17:28
  • Have you tried "page.waitForNavigation({ waitUntil: 'networkidle0' })" instead of using waitForSelector? – WMRamadan Feb 02 '20 at 07:05
  • Yes, I tried. And it didn't work – Androz2091 Feb 02 '20 at 08:21
  • 1
    @Androz2091can you try goto with option waitUntil, "await page.goto(entMessagesURL, {waitUntil: 'networkidle2'}); " – Chuong Tran Feb 05 '20 at 10:16
  • I also think about `waitUntil` https://pptr.dev/#?product=Puppeteer&version=v1.17.0&show=api-pagegotourl-options – storenth Feb 05 '20 at 16:00
  • it didn't work... – Androz2091 Feb 06 '20 at 17:01
  • have you been able to figure out what was the problem? I have similar issue and suspect that the javascript is not executed to load the missing pieces on the page – grafbumsdi May 15 '20 at 10:08
  • Yes @grafbumsdi, it's now working. Here is my fixed code: https://github.com/Androz2091/pronote-bot/blob/master/pronote/fetchMessage.js. Sorry, I couldn't remember how I fixed it, but this code works. – Androz2091 May 15 '20 at 11:50

4 Answers4

10
await page.setUserAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36");

This worked for me! My website was blocking headless mode when tried it locally. After adding the header, I was finally able to find the selector.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Boost
  • 111
  • 1
  • 6
1

Im pretty sure its a condition race and it is happening because you are trying to get the selector before you go to the page.

Try to move those lines:

await page.goto(entMessagesURL);
navPromise = page.waitForSelector('#js_boite_reception').then(() => {
    console.log('received');
});

I cant try reproduce your error because i dont know what page is and the language that it had been writted

Alejandro Molina
  • 170
  • 2
  • 13
1

You can try with an option waitUntil

await page.goto(entMessagesURL, {waitUntil: 'networkidle2'});
Chuong Tran
  • 3,131
  • 17
  • 25
0

I had no problems on a login page, but the home page was broken (half loaded) in a headless mode. (puppeteer-extra-plugin-stealth plugin is also active)

In my case this helped:

  1. In any normal browser rob headers from request, I took the first request after submit/sing_up button was pressed
  2. Add all/some robbed headers before you do anything (before moving to target page):
await page.setUserAgent('......................');
await page.setExtraHTTPHeaders({
  'Accept-Language': '.....................',
  'Cache-Control':'.....................',
  'Connection': 'keep-alive',
  'Sec-Fetch-User': '?1',
  'sec-ch-ua': '.....................',
  'sec-ch-ua-mobile': '?0',
  'sec-ch-ua-platform': '"Linux"',
});
  1. Try it, if no result yet - delete/add some headers and retry (I had 9 more headers in original request)
Tyler2P
  • 2,324
  • 26
  • 22
  • 31
IO_Nox
  • 1
  • 2