2

I am trying to return a list of XHR urls from Python Async. Below is my code.

import asyncio
from pyppeteer import launch

async def intercept_response(res):
    resourceType = res.request.resourceType
    xhr_list = []
    if resourceType in ['xhr']:
        print(res.request.url)
        xhr_list.append(res.request.url)
    return xhr_list

async def main():
    browser = await launch(headless=False)
    page = await browser.newPage()
    page.on('response', intercept_response)
    await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1')
    await page.goto('https://www.iesdouyin.com/share/user/70015326114', waitUntil = 'networkidle2')
    await browser.close()

if __name__ == '__main__':
    url = asyncio.run(main())
    print(url)

However, when I run the code, res.request.url got printed out, but xhr_list is not returned, causing url to be None. Is there something wrong with my code?

jackliu
  • 41
  • 2
  • `url` will be assigned whatever value you return from `main`. Since you don't return anything from it, `url` is set to `None`. – dirn May 17 '20 at 12:24

1 Answers1

2

There are two problems with your code. First, intercept_response tries to construct a list, but the list is always freshly created and always consists of at most a single element. Since intercept_response is called multiple times, it should append to the same list.

Also, you need to ensure that the return value of intercept_response propagates to main, and actually return it from there. For example, you can use a closure (an inner def) that assigns to a variable defined in the outer scope:

async def main():
    browser = await launch(headless=False)
    page = await browser.newPage()
    url = []
    async def intercept_response(res):
        if res.request.resourceType == 'xhr':
            print(res.request.url)
            url.append(res.request.url)
    page.on('response', intercept_response)
    await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1')
    await page.goto('https://www.iesdouyin.com/share/user/70015326114', waitUntil = 'networkidle2')
    await browser.close()
    return url
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • This code partially soloved my problem. But this problem remains: Three XHR urls got printed, but only one was appended to the xhr_list. Thus the url returned has only 1 element. I suspect this has something to do with Asynico await method? – jackliu May 17 '20 at 23:20
  • Is this because I am supposed to use Gather to get the return value? @user4815162342 – jackliu May 18 '20 at 01:34
  • @jackliu Your problem is unrelated to asyncio, your `intercept_response` always creates a fresh list. When called multiple times, it returns three different lists, and the last one gets used. I've now edited the code so that different invocations of `intercept_response` append to the same list. – user4815162342 May 18 '20 at 06:56
  • @jackliu No problem. If the question is now resolved, please remember to [accept](https://meta.stackexchange.com/a/5235/627709) the answer. – user4815162342 May 18 '20 at 20:57