I have about 130 asynchronous GET requests being sent using httpx and asyncio in python, via a proxy which I created myself on AWS.
In the python script, I have printed the time just before each request is sent and can see that they are all sent within less than 70ms. However, I have timed the duration of the requests by getting the current time immediately after and some requests take up to 30 seconds! The distribution seems pretty level over this time so I am getting back about 3-5 requests every second for 30 seconds.
I used tcpdump and wireshark to look at the packets coming back, and it seems that all the application data is coming back within 4 seconds (including the tcp handshakes) so I don't understand the reason for the delay in python.
The tcp teardowns are happening up to 35 seconds later so maybe this could be the reason for the delay? Does httpx wait for the connection to close (FIN and ACK) before the httpx.get() is unblocked and the request can be read?
What can I try to speed this up?
Here is a simplified version of my code:
import asyncio
import datetime
import httpx
from utils import store_data, get_proxy_addr
CLIENT = None
async def get_and_store_thing_data(thing):
t0 = datetime.now()
res = await CLIENT.get('https://www.placetogetdata.com', params={'thing': thing})
t1 = datetime.now()
# It's this line that shows the time is anywhere from 0-30 seconds for the
# request to return
print(f'time taken: {t1-t0}')
data = res.json()
store_data(data)
return data
def get_tasks(things):
tasks = []
for thing in things:
tasks = get_and_store_thing_data(thing)
tasks.append(tasks)
return tasks
async def run_tasks(tasks):
global CLIENT
CLIENT = httpx.AsyncClient(proxies={'https://': proxy_addr})
try:
await asyncio.wait(tasks)
finally:
await CLIENT.aclose()
def run():
proxy_addr = get_proxy_addr()
tasks = get_tasks
asyncio.run(run_tasks(tasks, proxy_addr))