2

I am trying to use the RequestSetIntercept function to quicken the loading of webpage with Pyppeteer.

However I am getting the warning:

RuntimeWarning: coroutine 'block_image' was never awaited

I can't figure out where I am missing an await. I've added awaits withing the intercept function itself following a template I've found online. I am testing out the setIntercept function with Pyppeeteer.

Thank you.

#utils.py

class MakeRequest():

    ua = User_Agent()

    async def _proxy_browser(self, url,
                             headless = False,
                             intercept_func = None,
                             proxy = True,
                             **kwargs):

        if proxy:
            args = [*proxy*
                '--ignore-certificate-errors']

        else:
            args = ['--ignore-certificate-errors']

        for i in range(3):
            try:
                browser = await launch(headless = headless,
                                       args = args,
                                       defaultViewport = None)
                
                page = await browser.newPage()
                await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0')
                
                if intercept_func is not None:
                    await page.setRequestInterception(True)
                    page.on('request', intercept_func)

                await page.goto(url, {'waitUntil' : 'load', 'timeout': 0 })
                content = await page.content()

                return content

            except (pyppeteer.errors.PageError,
                    pyppeteer.errors.TimeoutError,
                    pyppeteer.errors.BrowserError,
                    pyppeteer.errors.NetworkError) as e:
              print('error', e)
              time.sleep(2)
              continue

            finally:
                await browser.close()
        return 

scraper.py:

REQUESTER = MakeRequest()

async def block_image(request):

        if request.url.endswith('.png') or request.url.endswith('.jpg'):
            print(request.url)
            await request.abort()
        else:
            await request.continue_()


def get_request(url):

     for i in range(3):    
        response =  REQUESTER.proxy_browser_request(url = url,
                                                    headless = False,
                                                    intercept_func = block_image)

        if response:
            return response
        else:
            print(f'Attempt {i +1} : {url}links not found')
            print('retrying...')
            time.sleep(3)
MasayoMusic
  • 594
  • 1
  • 6
  • 24

1 Answers1

2

Your function block_image is a coroutine, but the callback passed to page.on is expected to be a normal function. Try writing a synchronous lambda function that wraps the coroutine in a Task (thus scheduling it on the current event loop):

if intercept_func is not None:
    await page.setRequestInterception(True)
    page.on('request', lambda request: asyncio.create_task(intercept_func(request)))

There's an example of this kind of code in the Pyppteer documentation here. If you're using an older version of Python (<3.7), use asyncio.ensure_future instead of asyncio.create_task (as is done in the example in the docs).

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • 1
    Thank you for this. I've awarded you the points, but once I am able to test it out, I will accept you answer. Just need a few days or so, as I am having a git crisis. – MasayoMusic Aug 06 '21 at 04:57