2

I've tried using both httpx and aiohttp, and both have this hard-coded limit.

import asyncio

import aiohttp
import httpx


async def main():
    client = aiohttp.ClientSession() 
    # client = httpx.AsyncClient(timeout=None)

    coros = [
        client.get(
            "https://query1.finance.yahoo.com/v8/finance/chart/",
            params={"symbol": "ADANIENT.NS", "interval": "2m", "range": "60d",},
        )
        for _ in range(500)
    ]

    for i, coro in enumerate(asyncio.as_completed(coros)):
        await coro
        print(i, end=", ")


asyncio.run(main())

Output -

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99

And its just stuck at 99 with both libraries

This doesn't happen if a new Session is used for every request though.

What am I doing wrong? Isn't the whole point of asyncio to make things like this easy?


I tried re-writing this with threads, zmq and requests and it works great -

import zmq

N_WORKERS = 100
N_ITERS = 500

ctx = zmq.Context.instance()


def worker():
    client = requests.Session()

    pull = ctx.socket(zmq.PULL)
    pull.connect("inproc://#1")

    push = ctx.socket(zmq.PUSH)
    push.connect("inproc://#2")

    while True:
        if not pull.recv_pyobj():
            return

        r = client.get(
            "https://query1.finance.yahoo.com/v8/finance/chart/",
            params={"symbol": "ADANIENT.NS", "interval": "2m", "range": "60d",},
        )
        push.send_pyobj(r.content)


def ventilator():
    push = ctx.socket(zmq.PUSH)
    push.bind("inproc://#1")

    # distribute tasks to all workers
    for _ in range(N_ITERS):
        push.send_pyobj(True)

    # close down workers
    for _ in range(N_WORKERS):
        push.send_pyobj(False)



# start workers & ventilator
threads = [Thread(target=worker) for _ in range(N_WORKERS)]
threads.append(Thread(target=ventilator))
for t in threads:
    t.start()

# pull results from workers
pull = ctx.socket(zmq.PULL)
pull.bind("inproc://#2")

for i in range(N_ITERS):
    pull.recv_pyobj()
    print(i, end=", ")

# wait for workers to exit
for t in threads:
    t.join()

Dev Aggarwal
  • 7,627
  • 3
  • 38
  • 50

1 Answers1

2

The problem is that you client.get(...) returns a request object with live handle to OS-level socket. Failing to close that object results in aiohttp running out of sockets, i.e. hitting the connector limit, which is 100 by default.

To fix the problem you need to close the object returned by client.get(), or use async with which will ensure that the object gets closed as soon as the with block is done. For example:

async def get(client):
    async with client.get(
            "https://query1.finance.yahoo.com/v8/finance/chart/",
            params={"symbol": "ADANIENT.NS", "interval": "2m", "range": "60d",}) as resp:
        pass

async def main():
    async with aiohttp.ClientSession() as client:
        coros = [get(client) for _ in range(500)]
        for i, coro in enumerate(asyncio.as_completed(coros)):
            await coro
            print(i, end=", ", flush=True)

asyncio.run(main())

Additionally, the aiohttp.ClientSession object should also be closed, which can also be accomplished using async with, as shown above.

user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • Using a higher limit fixes it. Since I don't know the # of requests I need to fulfil before hand, should I just set it to math.inf? – Dev Aggarwal Aug 17 '20 at 10:39
  • Won't that make the point of having a shared session null? – Dev Aggarwal Aug 17 '20 at 10:41
  • 1
    @DevAggarwal You can use `None` rather than `math.inf`, which might just work by accident. I don't think it nullifies the use of sharing a session because session contains things other than the connector. Having said that, I would still not raise the limit before understanding why the blockage occurs, perhaps even reporting it as a bug to aiohttp maintainers. – user4815162342 Aug 17 '20 at 10:56
  • Do you think it might be a macOS thing? I can't believe this is a bug after all those "million requests" blog posts using python asyncio. – Dev Aggarwal Aug 17 '20 at 13:29
  • 1
    @DevAggarwal I figured out the problem, the trouble is that you never closed the objects returned by `client.get()`. See the edited answer. – user4815162342 Aug 18 '20 at 08:38
  • Can you please share your process of figuring this out? Teach a man to fish... – Dev Aggarwal Aug 20 '20 at 19:17
  • 1
    @DevAggarwal An interesting challenge - let me try! First, I took another look at your example and realized that it's actually runnable, i.e. that it doesn't require some in-house setup. I ran it and, sure enough, reproduced the issue. I then modified the code to instantiate the session with a much smaller-limit `TCPConnector`, say 10. Still the same hang - so it's not a server-side block, and highly unlikely to be an aiohttp bug. – user4815162342 Aug 20 '20 at 19:49
  • 1
    I scratched my head thinking, what could possibly prevent the connector from creating new connections? As you said in the question, this kind of parallel download is pretty much the most basic thing one can try with aiohttp, and there are a bunch of tutorials how to do exactly that, which run just fine. (I myself have written some of them here.) So I shifted my attention to what you're _doing_ with the downloaded data - and found - nothing! There was a `client.get`, right there, passed to `as_completed`, just like that. – user4815162342 Aug 20 '20 at 19:51
  • 1
    It dawned on me that `client.get` doesn't return data, but a response handle that knows how to retrieve the data. And this handle has a handle to the connection, and it certainly must be closed in order for that connection to be released (either closed or either reused for another request). At this point I was almost certain I found the issue, as flags pointing to that piled up. Like if you've ever seen aiohttp code, it always has all these `async with` blocks to deal with the response - they're written precisely to ensure that response objects are released. So, that's it, hope it helps. – user4815162342 Aug 20 '20 at 19:52