1

Here's the scoop.
I'm trying to use Puppeteer v18.0.5 with the bundled chromium browser against a specific website. I'm using Node v16.16.0 However, when I enable request interception via page.setRequestInterception(true), all of the HTTPRequests for any image resources are lost. My handler is invoked far less while intercepting than when not intercepting. The page never fires any requests for images. But when I disable the interception, the page loads normally. Yes, I know about invoking continue() on all requests. I'm currently doing that in the request handler on the page.

I've also poured over the Puppeteer issues pages and have found similar symptoms on some of the earlier Puppeteer versions, but they were all different issues that have all been resolved since those early versions. This seems unique.

I've looked through Puppeteer source code as well as CDP events to try and find any explanation, but have found none.

As an important note for anyone trying to reproduce this, you must be proxied through a server in the London general area in order to successfully load this site.

Here's my code to reproduce:

const puppeteer = require('puppeteer');

(async () => {
    const options = {
        browserWidth: 1366,
        browserHeight: 983,
        intercepting: false
    };

    const browser = await puppeteer.launch(
        {
            args: [`--window-size=${options.browserWidth},${options.browserHeight}`],
            defaultViewport: {width: options.browserWidth, height: options.browserHeight},
            headless: false
        }
    );
    const page = (await browser.pages())[0];
    page.on('request', async (request) => {
        console.log(`Request: ${request.method()} | ${request.url()} | ${request.resourceType()} | ${request._requestId}`);
        if (options.intercepting) await request.continue();
    });
    await page.setRequestInterception(options.intercepting);
    await page.goto('https://vegas.williamhill.com', {waitUntil: 'networkidle2', timeout: 65000});

    // To give a moment to view the page in headful mode before closing browser.
    await new Promise(resolve => setTimeout(resolve, 5000));
    await browser.close();
})();

Here's what the page looks like with intercepting disabled: Expected Page Load

Here's what the page looks like with intercepting enabled and continuing all requests. Page load while intercepting and continuing all requests

With request interception disabled my handler is invoked for 104 different requests. But with the interception enabled it's only invoked 22 times. I'm not hitting a navigation timeout as the .goto() method returns before my timeout each time.

Any insight into what configuration/strategy I'm missing would be immensely appreciated.

Rob Mount
  • 21
  • 4
  • Similar issue here. Even when my request handler executes `request.continue();` as only line of code, it seems that some inline JavaScripts are not executed, so page does not fully load when using request interception. – rabudde Jun 16 '23 at 12:22

1 Answers1

0

Maybe you are incepting some javascript files that initiate the requests that you are not seeing?

Yisheng Jiang
  • 110
  • 1
  • 5
  • I've wondered that also, but I have a blanket handler that allows all requests to continue. So if a JS file is getting intercepted, I'm very curious where the intercept is being dispatched for handling, if not in the request event. – Rob Mount Oct 04 '22 at 15:16