2

I am trying to get XHR using Python Pyppeteer. Here is my code.

import asyncio
from pyppeteer import launch
import json

async def intercept_response(res):
    resourceType = res.request.resourceType
    if resourceType in ['xhr']:
        resp = await res.text()
        try:
            r = json.loads(resp)
            print(res.request.url)
        except:
            pass
    return res.request.url

async def main():
    browser = await launch(headless=False)
    page = await browser.newPage()
    page.on('response', intercept_response)
    await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1')
    await page.goto('https://www.iesdouyin.com/share/user/70015326114')
    await page.waitForSelector('li.item goWork')
    await browser.close()

if __name__ == '__main__':
    url = asyncio.run(main())
    print('IS THIE WAHT YOU WANT:', url)

But when I run it, the browser never closes, and after 30s, it gives me a TimeoutError. And the code is supposed to return the url of the xhr response, but it doesn't.

jackliu
  • 41
  • 2

1 Answers1

1

The reason you are having this issue is that the event emitter used by this version of pyppeteer doesn't support async event subscribers. The next version of the library, which is in active development (at the time of writing) will allow for this.

def intercept_response(res):
    async def intercept_response(res):
        resourceType = res.request.resourceType
        if resourceType in ['xhr']:
            resp = await res.text()
            try:
                r = json.loads(resp)
                print(res.request.url)
            except:
                pass
        return res.request.url
    asyncio.get_event_loop().run_until_complete(intercept_response(res))

Secondly, your code doesn't all for the "return the url of the xhr response". Your function main implicitly returns None. Just because you specified an event handler doesn't mean the return value of that parameter is magically returned from the function you first attached the handler in. Here's one way of accomplishing what I think you are trying to do, though:

async def main():
    browser = await launch(headless=False)
    page = await browser.newPage()
    resp_fut, interceptor = make_interceptor()
    page.on('response', interceptor)
    await page.goto('https://www.iesdouyin.com/share/user/70015326114')
    await page.waitForSelector('li.item goWork')
    resp = await resp_fut
    await browser.close()
    return resp

This solution isn't the best though, as at will hang indefinitely if the future result is not set. You may want look at asyncio.wait_for, or better yet, just use the built in Page.waitForRequest method (;

Mattwmaster58
  • 2,266
  • 3
  • 23
  • 36