Here's the scoop.
I'm trying to use Puppeteer v18.0.5 with the bundled chromium browser against a specific website. I'm using Node v16.16.0 However, when I enable request interception via page.setRequestInterception(true), all of the HTTPRequests for any image resources are lost. My handler is invoked far less while intercepting than when not intercepting. The page never fires any requests for images. But when I disable the interception, the page loads normally. Yes, I know about invoking continue() on all requests. I'm currently doing that in the request handler on the page.
I've also poured over the Puppeteer issues pages and have found similar symptoms on some of the earlier Puppeteer versions, but they were all different issues that have all been resolved since those early versions. This seems unique.
I've looked through Puppeteer source code as well as CDP events to try and find any explanation, but have found none.
As an important note for anyone trying to reproduce this, you must be proxied through a server in the London general area in order to successfully load this site.
Here's my code to reproduce:
const puppeteer = require('puppeteer');
(async () => {
const options = {
browserWidth: 1366,
browserHeight: 983,
intercepting: false
};
const browser = await puppeteer.launch(
{
args: [`--window-size=${options.browserWidth},${options.browserHeight}`],
defaultViewport: {width: options.browserWidth, height: options.browserHeight},
headless: false
}
);
const page = (await browser.pages())[0];
page.on('request', async (request) => {
console.log(`Request: ${request.method()} | ${request.url()} | ${request.resourceType()} | ${request._requestId}`);
if (options.intercepting) await request.continue();
});
await page.setRequestInterception(options.intercepting);
await page.goto('https://vegas.williamhill.com', {waitUntil: 'networkidle2', timeout: 65000});
// To give a moment to view the page in headful mode before closing browser.
await new Promise(resolve => setTimeout(resolve, 5000));
await browser.close();
})();
Here's what the page looks like with intercepting disabled: Expected Page Load
Here's what the page looks like with intercepting enabled and continuing all requests. Page load while intercepting and continuing all requests
With request interception disabled my handler is invoked for 104 different requests. But with the interception enabled it's only invoked 22 times. I'm not hitting a navigation timeout as the .goto() method returns before my timeout each time.
Any insight into what configuration/strategy I'm missing would be immensely appreciated.