0

I am trying to use the aynscio and aiohttp packages to request a web page. However, the web page response is:

<p class="warning-title"> Please upgrade your web browser. </p>  <br/>
<p class="p-top-30">This browser version is outdated, and may not be fully compatible with our website. Please upgrade to a newer version or use another browser.    </p>

It doesn't actually load the page I'm trying to access but the homepage instead.

CODE

from fake_useragent import UserAgent
import ssl
from bs4 import BeautifulSoup
import asyncio
import aiohttp

ua = UserAgent()

hdr = {'User-Agent': str(ua.chrome),
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.5',
       'Connection': 'keep-alive'}

ssl_ctx = ssl.create_default_context()
ssl_ctx.check_hostname = False
ssl_ctx.verify_mode = ssl.CERT_NONE

url = '...'

async def parse_website(session):
    async with session.get(url) as response:
        html = await response.text()

    soup = BeautifulSoup(html, 'html.parser')

    print(soup)

async with asyncio.Semaphore(3):
    async with aiohttp.TCPConnector(ssl=ssl_ctx, limit=None) as connector:
        async with aiohttp.ClientSession(connector=connector, headers=hdr) as session:
            for i in range(1):
                await parse_website(session)

I have tried not including the headers argument in the third to last line async with aiohttp.ClientSession(connector=connector) as session: but then the response is that I didn't wait long enough for the captcha. So I have to use the headers argument to bypass the captcha but I consistently get a Please upgrade your browser response. I also tried adding cookies={} to the same line async with aiohttp.ClientSession(connector=connector, headers=hdr, cookies={}) as session: but get the same original response saying the browser is out of date.

I'm also only showing one url request here. Once I have this working I'll scale to thousands, so that's why I'm trying to make this work with the asyncio and aiohttp packages.

Could someone tell me where I'm going wrong here?

Alex F
  • 2,086
  • 4
  • 29
  • 67
  • what is the response code you are getting? Which site do you want to access? Are you sure they are not preventing it in some way? – The Fool Dec 02 '19 at 20:54
  • I was using `connector=aiohttp.TCPConnector(ssl=False)` maybe that helps – The Fool Dec 02 '19 at 20:57
  • Your issue is most likely with your `url`, and not so much so with Python. Impossible for any of us to debug it without knowing the website you are trying to scrape. – felipe Dec 05 '19 at 03:51

0 Answers0