4

Problem I'm trying to solve: I'm making many api requests to a server. I'm trying to create delays bewtween async api calls to comply with the server's rate limit policy.

What I want it to do I want it to behave like this:

  1. Make api request #1
  2. wait 0.1 seconds
  3. Make api request #2
  4. wait 0.1 seconds ... and so on ...
  5. repeat until all requests are made
  6. gather the responses and return the results in one object (results)

Issue: When when I introduced asyncio.sleep() or time.sleep() in the code, it still made api requests almost instantaneously. It seemed to delay the execution of print(), but not the api requests. I suspect that I have to create the delays within the loop, not at the fetch_one() or fetch_all(), but couldn't figure out how to do so.

Code block:

async def fetch_all(loop, urls, delay): 
    results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
    return results

async def fetch_one(loop, url, delay):

    #time.sleep(delay)
    #asyncio.sleep(delay)

    async with aiohttp.ClientSession(loop=loop) as session:
        async with session.get(url, ssl=SSLContext()) as resp:
            # print("An api call to ", url, " is made at ", time.time())
            # print(resp)
            return await resp

delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))

Versions I'm using: 
python                    3.8.5
aiohttp                   3.7.4
asyncio                   3.4.3

I would appreciate any tips on guiding me to the right direction!

Aaron Ahn
  • 77
  • 1
  • 6

2 Answers2

5

The call to asyncio.gather will launch all requests "simultaneously" - and on the other hand, if you would simply use a lock or await for each task, you would not gain anything from using parallelism at all.

The simplest thing to do, if you know the rate you can issue the requests, is simply to increase the asynchronous pause before each request in sucession - a simple global variable can do that:


next_delay = 0.1

async def fetch_all(loop, urls, delay): 
    results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
    return results

async def fetch_one(loop, url, delay):
    global next_delay
    
    next_delay += delay
    await asyncio.sleep(next_delay)

    async with aiohttp.ClientSession(loop=loop) as session:
        async with session.get(url, ssl=SSLContext()) as resp:
            # print("An api call to ", url, " is made at ", time.time())
            # print(resp)
            return await resp

delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))

Now, if you want like, issue 5 requests and then issue the next 5, you could use a synchronization primitive like asyncio.Condition, using its wait_for on an expression which checks how many api calls are active:

active_calls = 0

MAX_CALLS = 5

async def fetch_all(loop, urls, delay): 
    event = asyncio.Event()
    event.set()
    results = await asyncio.gather(*[fetch_one(loop, url, delay, event) for url in urls], return_exceptions=True)
    return results

async def fetch_one(loop, url, delay, cond):
    global active_calls
    
    active_calls += 1
    if active_calls > MAX_CALLS:
        event.clear()
        
    await event.wait()
    
    try:
        async with aiohttp.ClientSession(loop=loop) as session:
            async with session.get(url, ssl=SSLContext()) as resp:
                # print("An api call to ", url, " is made at ", time.time())
                # print(resp)
                return await resp
    finally:
        active_calls -= 1
    if active_calls == 0:
        event.set()
        

urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))

For both examples, should your task avoid global variables in the design (actually,these are "module" variables) - you could either move all funtions to a class, and work on an instance, and promote the global variables to instance attributes, or use a mutable container, such as a list for holding the active_calls value in its first item, and pass that as a parameter.

jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • Thank you for giving me such a good tip! So many tools and ways to do things in Python. – Aaron Ahn Jul 06 '21 at 15:38
  • exactly what did I look for, thank you – Rodin Oleksandr Oct 07 '21 at 16:32
  • Great answer, thanks! But why are we (in your first example) incrementing `next_delay`? Intuitively, I would expect that to mean that the last of, say, 1000 requests sleeps for 100s instead of 0.1s, but somehow that does not seem to be the case. – Mophotla Nov 11 '22 at 07:06
  • The `next_delay` increase is so that each task spaces out the execution of its "core" part, therefore spacing the actual network requests by the value of `delay`. It is more of an example, as this would prevent any actual parallelism and issue one request at a time. With a little tweaking this can be changed so that 10 requests are made at a time, in parallel, maintaining the api usage throughput (actually, just fine-tunning the "delay" amount here could do that. – jsbueno Nov 11 '22 at 13:43
3

When you use asyncio.gather you run all fetch_one coroutines concurrently. All of them wait for delay together, than make API calls instantaneously together.

To solve the issue, you should either await fetch_one in one by one in fetch_all or to use Semaphore to signalize next shouldn't start before previous is done.

Here's the idea:

import asyncio

_sem = asyncio.Semaphore(1)


async def fetch_all(loop, urls, delay): 
    results = await asyncio.gather(*[fetch_one(loop, url, delay) for url in urls], return_exceptions=True)
    return results

async def fetch_one(loop, url, delay):

    async with _sem:  # next coroutine(s) will stuck here until the previous is done
        await asyncio.sleep(delay)

        async with aiohttp.ClientSession(loop=loop) as session:
            async with session.get(url, ssl=SSLContext()) as resp:
                # print("An api call to ", url, " is made at ", time.time())
                # print(resp)
                return await resp

delay = 0.1
urls = ['some string list of urls']
loop = asyncio.get_event_loop()
loop.run_until_complete(fetch_all(loop, urls, delay))
Mikhail Gerasimov
  • 36,989
  • 16
  • 116
  • 159