0

I have a set of URLs (same http server but different request parameters). What I want to achieve is to keep on requesting all of them asynchronously or in parallel, until I kill it.

I started with using threading.Thread() to create one thread per URL, and do a while True: loop in the requesting function. This worked already faster than single thread/single request of course. But I would like to achieve better performance.

Then I tried aiohttp library to run the requests asynchronously. My code is like this (FYI, each URL is composed with url_base and product.id, and each URL has a different proxy to be used for the request):

async def fetch(product, i, proxies, session):

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}

    while True:
        try:
            async with session.get(
                url_base + product.id,
                proxy = proxies[i],
                headers=headers,
                ssl = False)
            ) as response:
                content = await response.read()
                print(content)
        except Exception as e:
            print('ERROR ', str(e))


async def startQuery(proxies):
    tasks = []
    async with aiohttp.ClientSession() as session:
        for [i, product] in enumerate(hermes_products):
            task = asyncio.ensure_future(fetch(product, i, proxies, session))
            tasks.append(task)
        responses = asyncio.gather(*tasks)
        await responses


loop = asyncio.get_event_loop()
loop.run_until_complete(startQuery(global_proxy))

The observation is: 1) it is not as fast as I would expect. Actually slower than using threads. 2)More importantly, the requests only returned normal in the beginning of the running, and soon almost all of them returned several errors like:

ERROR  Cannot connect to host PROXY_IP:PORT ssl:False [Connect call failed ('PROXY_IP', PORT)]

or

ERROR  503, message='Too many open connections'

or

ERROR  [Errno 54] Connection reset by peer

Am I doing something wrong here (particularly with the while True loop? If so, how can I achieve my goal properly?

Wayee
  • 379
  • 1
  • 5
  • 17
  • Errors you have mentioned show that, the server/proxy you are reaching have identified your requests as `DDoS attack` and blocked further connection. – Liju Aug 24 '20 at 15:06
  • But how come if I use synchronous calls with ```Request``` or ```urllib3``` I didn't get those errors. And since each request, as shown in the code, is using a different proxy, I don't see why the requests were identified as ```DDoS attack```? – Wayee Aug 24 '20 at 15:42
  • Your first error seems to be from proxy itself, reason for second error is parallel connections and 3rd error shows you are probably blocked. My wild guess would be the server/proxy checks for number of simultaneous connections and interval between them as one of its security checks. – Liju Aug 24 '20 at 15:50
  • Thanks, that sounds reasonable. I'll contact my proxy service provider to see if they have restrictions on simultaneous connections. If that would apply, what would be the best practice for my case to have optimal performance? – Wayee Aug 24 '20 at 15:53

0 Answers0