0

I have a simple python function with the following pseudo-code:

while True:

    # 1 - CPU Intensive calculations(Synchronous)
    
    
    # 2 - Take the result from the synchronous calculation and POST it to a server

The while loop runs forever and does CPU intensive calculations for the first half of the loop. All of that is run synchronously and that's alright. What I'd like to do is shave off some time by making the POST request asynchronous. The while loop needs to run forever, however I'd like the request to be made asynchronously so that the loop can continue with the next iteration without waiting for the request to resolve.

Would like to know what is the best way to achieve this with asyncio without using any extra threads/greenlets/processes

Edit


Before posting the question here, I have tried this approach:

async def example_function():
    # Synchronous CPU Intensive calculations

    client = httpx.AsyncClient()
    # Make a call to the REST API
    await client.post()


async def main():
    while True:
        await asyncio.gather(example_function(), example_function())


if __name__ == "__main__":
    asyncio.run(main())

I am completely new to asynchronous programming with python. I tried the gather method, however the flaw with my implementation approach is that the second iteration of the example_function() does not make the POST req asynchronously. I understand that the asyncio.gather() basically schedules a task for each of the functions passed to it, and I one of the tasks awaits, it continues with execution of the next one. However, I need to run the example_function() in a loop forever and not just n times

Varun
  • 95
  • 1
  • 11
  • What if the second request needs to be sent and the first one is not yet done? – user4815162342 Jul 09 '20 at 15:54
  • For 99% of the times I don’t forsee that happening since the computations are quite intense and they typically take much longer than the single request. However, for some reason if the (n)th request does long, I would still like to be able to make the (n + 1)th request. – Varun Jul 09 '20 at 16:23
  • Welcome to [so]. Please be aware this is not a code-writing service. We can help solve specific, technical problems, not open-ended requests for code or advice. Please [edit] your question to show what you have tried so far, and what specific problem you need help with. See the [ask] page for details how to best help us help you. – MisterMiyagi Jul 09 '20 at 16:51
  • Is there any reason why you want to use ``asyncio``? Do you have any ``async`` code in place already? How long does #1 take compare to #2 and why don't you want to use threads? Unless #1 is very short and #2 is very long, a single separate thread to process #2 from a queue should be enough. – MisterMiyagi Jul 09 '20 at 16:54
  • @MisterMiyagi Sorry, I am relatively new here. – Varun Jul 10 '20 at 01:52
  • Why do I want to use asyncio and not threads? After breaking it down, I realize that my problem has 2 parts - First is the synchronous CPU intensive task and the second is the IO which makes requests to a server. While the CPU intensive tasks are not something that can take advantage of asyncio, the IO operations are definitely something that can benefit from it. While I am not restricting myself to just asyncio, I just thought it would be better to make use of asyncio for the IO tasks. – Varun Jul 10 '20 at 02:01
  • The benefit of asyncio is for concurrent requests at the scale of 10.000 and above. Judging by your description, you have one concurrent request that is done before the next. A single thread to perform the requests sequentially is more than sufficient for this. Resist the temptation to use asyncio just because it is cool. – MisterMiyagi Jul 10 '20 at 05:13
  • @MisterMiyagi While I agree that some people post to this tag trying to use asyncio just to manage threads, this question is certainly not like that and didn't warrant getting closed. First, the question was very **clear**, it was obvious what the OP wanted to achieve, and the OP was very responsive with adding details as requested. Second, the asked problem is about *combining* CPU-bound computation with networking, which is well within the domain of asyncio. Finally, many modern Python networking libraries are asyncio-only, and it is perfectly reasonable to learn to use asyncio this way. – user4815162342 Jul 10 '20 at 07:43
  • @user4815162342 I don't feel like discussing the closing (that two more people agreed to without interacting with the OP in any way) in comments is adequate. Suffice to say, I do not consider the details sufficient to say what is "the best way"– and suffice to say, you are free to disagree and vote to reopen. If you feel that does not suffice, feel free to bring up the issue in chat. – MisterMiyagi Jul 10 '20 at 07:51
  • @MisterMiyagi I did vote to reopen, and the comment was an explanation of my reasons for doing so. Opinion-based questions invite flurries of different and often contradictory answers based on preferences, and that is clearly not the case here. I would even say that my answer was pretty much the canonical way to do it in asyncio (but I'm obviously biased, the answer being mine). – user4815162342 Jul 10 '20 at 08:00
  • 1
    Hi guys, opinions on the question will remain opinions. Myself and user4815162342 will seem bias since we’re in favor of the question. However, as a newbie around here, what I’d like to ask MisterMiyagi is what exactly was not right about the question? Was it the way the question was asked? If so, specifically what was it that made it an unsuitable way of framing the question. Or was it that the question itself was wrong? I agree that words like “best way to” are totally subjective. – Varun Jul 10 '20 at 16:23
  • 1
    As for the benefits of using asyncio, is there a rule book that says that it should only he used if one is dealing with massive numbers of concurrent IO operations? My use case currently is not such, however it could be in the future and I’d like to be able to cater for it at that point. – Varun Jul 10 '20 at 16:32

1 Answers1

1

Would like to know what is the best way to achieve this with asyncio without using any extra threads/greenlets/processes

If your code is synchronous, you will need to use some form of multi-threading, but you can do it in a clean way through the thread pool provided by asyncio for that purpose. For example, if you run your sync code through loop.run_in_executor(), the event loop will keep spinning while the calculations are running, and the tasks placed in the background will be serviced. This allows you to use asyncio.create_task() to run the second part of the loop in the background or, more precisely, in parallel with the rest of the event loop.

def calculations(arg1):
    # ... synchronous code here

async def post_result(result, session):
    async with session.post('http://httpbin.org/post',
                            data={'key1': result}) as resp:
        resp.raise_for_status()

async def main():
    loop = asyncio.get_event_loop()

    with aiohttp.ClientSession() as session:
        while True:
            # run the sync code off-thread so the event loop
            # continues running, but wait for it to finish
            result = await loop.run_in_executor(None, calculations, arg1)

            # don't await the task, let it run in the background
            asyncio.create_task(post_result(result, session))

if __name__ == '__main__':
    asyncio.run(main())
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • Thank you for the quick reply. It’s an interesting solution, I will surely try it out in a bit! Just to make sure that I understand the concept clearly, am I right to say that this will run the IO tasks in the event loop, while running the synchronous code in another thread? Just out of curiosity, would also love to find out which program would be more efficient in terms of the resources utilitized. Would the traditional synchronous version of the program be more efficient or this version which runs an event loop along with a separate thread for the computations. – Varun Jul 09 '20 at 17:09
  • You understood correctly. One thread will run the synchronous stuff (the thread will be reused as it's part of a thread pool), and the main thread will run all IO tasks, no matter how many of them pile up (but, as you said, in 99% cases there will be only one at a time). – user4815162342 Jul 09 '20 at 17:38
  • @Varun As for efficiency, I don't know what kind of synchronous program you're comparing this to. It would still have to have at least two threads, one for the calculation, and the other for the call to `requests.post` or whatever. I would expect its performance to be indistinguishable from this one. Asyncio provides performance benefits when you start a large number of tasks in parallel, where a sync program would have to start an equal number of threads, which doesn't scale as well. – user4815162342 Jul 09 '20 at 18:15
  • Thank you very much for your help :) I had been trying to make this work using asyncio methods like asyncio.gather(), but running the synchronous operations in a separate thread while running the IO in the async event loop sounds like a much better solution. Cheers – Varun Jul 10 '20 at 01:02
  • 1
    @Varun Since you have non-async code, you definitely need a thread in the mix - nothing you can do with asyncio (`gather`, `create_task`, etc.) will change the fact that asyncio is single-threaded and non-async code blocks the event loop. This is often the biggest misunderstanding by people who first encounter asyncio. Also, please note that if the answer resolves your question, you can [accept it](https://meta.stackexchange.com/a/5235/627709). – user4815162342 Jul 10 '20 at 08:14
  • 1
    Thank you very much for being patient and understanding my question. Because of your answer, the concept is now crystal clear in my head. The gist of the solution is basically to run the IO operations in the single threaded event loop while running the blocking sync code in a separate thread. I have implemented this and successfully managed to make it work as intended. While doing this does not bring me enormous benefits in 1 or 2 iterations, collectively it does help me because my program runs for prolonged periods and serves a real time application, therefore every millisecond saved is worth – Varun Jul 10 '20 at 16:40
  • @user4815162342 Your answer and subsequent comments are very useful for people who are just starting out with using async, like me. One thing I noticed in applying your solution is that while it is applicable in a case where the 'calculation'-loop never ends, if it runs for a limited time the last request(s) may be lost due to the fact that the ClientSession is destroyed before the last tasks are finished. In my case, every last request is lost due to an RuntimeError('Session is closed'). – jscheppers May 16 '22 at 08:48
  • @jscheppers Good point. In that case you should be collecting tasks into a simple list initialized with `tasks = []`, and add an `await asyncio.gather(*tasks)` after the loop. – user4815162342 May 16 '22 at 09:01