0

im trying to open a website in chrome whith pyppeteer, capture all the requests the website makes and look at the headers. If my code finds a specific header, it should close the browser and stop runnig.

my code:

import asyncio
import json
import time
from pyppeteer import launch


async def intercept_network_requests(request):
    for key in request.headers:
        if 'some_header_name' in key:
            print('Got header value: ',request.headers[key])              
            #now i want to close the browser and stop the script
                            

   
async def main():        
    browser = await launch(headless=False, autoclose=False)
    page = await browser.newPage()
    page.on('request', lambda request: asyncio.ensure_future(intercept_network_requests(request)))
    await page.goto('https://example.com')
    time.sleep(10000)
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

the script works, i get the string im looking for,but its just runs forever, im kinda new in python and not sure how asyncio works, i tried to put the page.on('request' into while loop and set some variable true when i find my header, but in that case it would never continue to await page.goto line

how to do this the right way ?

Pejko
  • 25
  • 5
  • when i remove the time.sleep(10000) line, it works the way i want it ,but im not sure if its the right solution – Pejko Sep 21 '21 at 19:40

1 Answers1

0

when i remove the time.sleep(10000) line, it works the way i want it ,but im not sure if its the right solution

You're partially right you don't need time.sleep, and even if you need apply sleep with any reason within async function, you should always await asyncio.sleep(<seconds>). But your code needs improvements.

When you're running await page.goto('https://example.com'), async loop is already waiting for page to load (by default 30 seconds timeout) and not moving to the next statement until the page loads (or timeout error). So, during page load asynchronously for all the requests your intercept_network_requests is called and after page loaded it moves to next statement and browser closed.

However even it will find the key, it is still continuously searching other requests, you can skip searching or even block all other requests:

header_found = False

async def intercept_network_requests(request):
    global header_found
    if header_found:
        request.abort()  # optional if you don't want to load more.
        # or an advance way to stop page loading is request._client.send("Page.stopLoading"), but its needs proper config to page.goto
        return  # don't search key in current request because already found.

    for key in request.headers:
        if 'some_header_name' in key:
            header_found = True
            break  # stop searching.
            print('Got header value: ',request.headers[key])              
            #now i want to close the browser and stop the script
                            

   
async def main():        
    browser = await launch(headless=False, autoclose=False)
    page = await browser.newPage()
    page.setRequestInterception(True)  # request.abort() does not work unless you add this
    page.on('request', lambda request: asyncio.ensure_future(intercept_network_requests(request)))
    await page.goto('https://example.com')
    await browser.close()
Faizan AlHassan
  • 389
  • 4
  • 8