0

I am looking for guidance around best practices with asyncio and aiohttp in Python 3. I have a basic scraper but I am not sure how to:

  1. Properly implement error handling. More specific around my fetch function.
  2. Do I really need the last main function to wrap my async crawler around?

Here is my code so far, it is working but I would like feedback on the two item above.

urls = []
async def fetch(url, payload={}):
    async with ClientSession() as s:
        async with s.get(url, params=payload) as resp:
            content = await resp.read()
            return content


async def get_profile_urls(url, payload):
    content = await fetch(url, payload)
    soup = BeautifulSoup(content, 'html.parser')
    soup = soup.find_all(attrs={'class': 'classname'})
    if soup:
        urls.extend([s.find('a')['href'] for s in soup])


async def main():
    tasks = []
    payload = {
        'page': 0,
        'filter': 88}
    for i in range(max_page + 1):
        payload['page'] += 1
        tasks.append(get_profile_urls(search_ulr, payload))
    await asyncio.wait(tasks)

asyncio.run(main()) 
Ale M.
  • 175
  • 2
  • 3
  • 11
  • this is more of a code review question but also there's many ways to handle errors. what is your _desired_ behavior when a URL is unsuccessful in its request? – gold_cy Jun 05 '20 at 01:28
  • Your question #2 is unclear. How would you avoid having this function? Also, what is `search_ulr`, it doesn't seem to be defined anywhere? – user4815162342 Jun 05 '20 at 12:29

0 Answers0