0

I wrote a simple program that only logs requests and responses, once with pyppeteer in Python, and (after I ran into the issues I will describe next) once with puppeteer in JavaScript. Here is the JS code:

const puppeteer = require('puppeteer');
const url = 'https://www.twitch.tv/';
(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.setRequestInterception(true);

    page.on('request', request => {
        console.log("REQUEST: " + request.url());
        request.continue();
    });

    page.on('response', response => {
        console.log("RESPONSE: " + response.url());
    });

    await page.goto(url, {waitUntil: ["networkidle0", "domcontentloaded"]});
    await browser.close();
})();

And here is the Python code:

import asyncio
from pyppeteer import launch

url = "https://www.twitch.tv/"

async def handle_request(request):
    print("REQUEST: ", request.url)
    await request.continue_()

async def handle_response(response):
    print("RESPONSE: ", response.url)

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.setRequestInterception(True)

    page.on('response', handle_response)
    page.on('request', handle_request)

    await page.goto(url, waitUntil=["networkidle0", "domcontentloaded"])
    await browser.close()

asyncio.get_event_loop().run_until_complete(main()) 

I then compare their output using:

> python3 script.py | grep "wasm"
REQUEST:  https://static.twitchcdn.net/assets/wasmworker.min-[redacted].js
REQUEST:  https://static.twitchcdn.net/assets/wasmworker.min-[redacted].wasm
> node script.js | grep "wasm"
(nothing)

My issues with this:

(1) Why am I getting different results at all? Shouldn't Puppeteer and Pyppeteer use the exact same browser in the background, and (hopefully) the same default settings (such as viewport... etc.)?
(2) Even though the Python version workks better (subjectively, for my use case), as it logs the requests, why doesn't it log the corresponding responses? When running in non-headless mode, in the developer console, both requests will show up with a response code of 200. What could cause the responses to not be logged by pyppeteer?

I tried using different viewport sizes and enabling/disabling the cache, to no avail.

EDIT: Okay, the reason for (1) seems to be that pyppeteer is just outdated. Regarding (2): twitch.tv does not serve the file I am grepping for when running with puppeteer (also the streams just do not work); Even though I set up puppeteer to use the same chrome executable and UserAgent string as when I manually visit the page, where it works. I thought it might have something to do with puppeteer disabling extensions, as the debug console shows some errors with cast_sender.js from the chrome cast extension, but even starting chrome with the exact saame arguments as puppeteer does load the files of interest.

  • It seems like they should do the same thing, it's probably an issue with different chromium versions. – pguardiario Apr 17 '20 at 00:20
  • You're correct, the Python version uses `Chrome/69.0.3494.0`, while the JS version uses `Chrome/80.0.3987.0`... And maybe the page only sends out these request based on browser versions? (Which would be weird, but could explain both issues) – Aaron Hilbig Apr 17 '20 at 16:26
  • Follow-Up: The requests are not based on browser versions, since I can see them in the developer console with "normal" Chrome... – Aaron Hilbig Apr 17 '20 at 16:32
  • Follow-Up-Follow-Up: I also tried to change the UserAgent to my browsers UA, no results there. – Aaron Hilbig Apr 17 '20 at 16:41
  • I just tried the puppeteer code with 78.0.3882.0 and it worked for me. – pguardiario Apr 18 '20 at 00:13
  • does not work for me with 81.0.4044.113 (see my edit above) – Aaron Hilbig Apr 19 '20 at 11:44

1 Answers1

-3

Dont forget the lambda it make the page.on calling whatever function you want

page.on('response', lambda res: interceptResponse(res))

page.on('request', lambda req: intercept(req))
turnerwhi
  • 1
  • 1