I have a set of URLs (same http server but different request parameters). What I want to achieve is to keep on requesting all of them asynchronously or in parallel, until I kill it.
I started with using threading.Thread()
to create one thread per URL, and do a while True:
loop in the requesting function. This worked already faster than single thread/single request of course. But I would like to achieve better performance.
Then I tried aiohttp
library to run the requests asynchronously. My code is like this (FYI, each URL is composed with url_base
and product.id
, and each URL has a different proxy to be used for the request):
async def fetch(product, i, proxies, session):
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'}
while True:
try:
async with session.get(
url_base + product.id,
proxy = proxies[i],
headers=headers,
ssl = False)
) as response:
content = await response.read()
print(content)
except Exception as e:
print('ERROR ', str(e))
async def startQuery(proxies):
tasks = []
async with aiohttp.ClientSession() as session:
for [i, product] in enumerate(hermes_products):
task = asyncio.ensure_future(fetch(product, i, proxies, session))
tasks.append(task)
responses = asyncio.gather(*tasks)
await responses
loop = asyncio.get_event_loop()
loop.run_until_complete(startQuery(global_proxy))
The observation is: 1) it is not as fast as I would expect. Actually slower than using threads. 2)More importantly, the requests only returned normal in the beginning of the running, and soon almost all of them returned several errors like:
ERROR Cannot connect to host PROXY_IP:PORT ssl:False [Connect call failed ('PROXY_IP', PORT)]
or
ERROR 503, message='Too many open connections'
or
ERROR [Errno 54] Connection reset by peer
Am I doing something wrong here (particularly with the while True
loop? If so, how can I achieve my goal properly?