3

My jobs are all a series of requests that need to be made per object. Ie, its a User with several data points (potentially hundreds) that need to be added to that user with requests. I had originally written those requests to run synchronously but it was blocking and slow. I was sending each User job to Python RQ and have 10 workers going through the Users sent down the queue. 1 worker, 1 user, blocking requests.

I've re-written my User job to use aiohttp instead of python requests, and its significantly faster. On the Python RQ documentation it says that 'Any Python function call can be put on an RQ queue.' but I can't figure out how to send my async function down the queue?


async def get_prices(calls: List[dict]) -> List[dict]:
     async with aiohttp.ClientSession() as session:
         for price in prices.items():
                price_type, date = price
                price = await pg.get_price(
                    session=session, lookup_date=date
                )
        do_some_other_stuff()
        await session.close()


from core.extensions import test_queue
from prices import get_prices
job = test_queue.enqueue(get_prices, kwargs={"username":'username'})

The problem is that get_prices is never awaited, it just remains a coroutine futures.... How can I await my function on the queue?

phil0s0pher
  • 525
  • 10
  • 21

3 Answers3

3

Since python-rq won't support asyncio directly, you can use a synchronous function that calls asyncio.run instead.

async def _get_prices(calls: List[dict]) -> List[dict]:
    # ...

def get_prices(*args, **kwargs):
    asyncio.run(_get_prices(*args, **kwargs))

Note, however, that asyncio.run only works if there's no other running event loop. If you expect an asyncio loop to already be running, use loop.create_task instead.

def get_prices(*args, **kwargs):
    loop = asyncio.get_event_loop()
    coro = _get_prices(*args, **kwargs)
    loop.create_task(coro)

Then when python-rq calls get_prices it will cause the async function to be executed.

Another option would be to not use asyncio for making requests, like using grequests, threads, or something like that which will work with synchronous functions.

sytech
  • 29,298
  • 3
  • 45
  • 86
  • 1
    This doesn't work as the worker processing the job still blocks until `get_prices` is done without being able to dequeue other jobs while waiting for _get_prices to be done. It is still a synchronous worker. – Gera Zenobi Mar 31 '23 at 19:36
  • @GeraZenobi it sounds like you have a somewhat different problem than the one being addressed here. Each RQ worker is typically a separate process (and therefore separate event loops), so they don't block each other. But the main point in this question is to ensure that the async function can be called and ran (and it works, with all its async concurrency working as implemented) not necessarily that `asyncio.run` can be called multiple times to get concurrent execution of multiple tasks. If you want 1 worker to be able to run multiple async tasks concurrency, that _may_ be somewhat different. – sytech Mar 31 '23 at 20:22
  • You are indeed correct @sytech. I was looking for a way to 'patch' RQ so that I would get asynchronous support within a same RQ worker. I believe it is not possible unfortunately and would require rewriting the Worker class quite a bit. Note what you suggested here is already built in now in the library: https://github.com/rq/rq/pull/1405 ; Here, in the same way, the worker blocks until the coroutine is done. – Gera Zenobi Apr 01 '23 at 10:23
1

You might consider using arq.

Created by the maintainer of Pydantic, it is not the same thing, but was inspired on rq.

Besides, it's still Redis and queues (with asyncio now).

From the docs:

Job queues and RPC in python with asyncio and redis.

arq was conceived as a simple, modern and performant successor to rq.

Simple usage:

import asyncio
from aiohttp import ClientSession
from arq import create_pool
from arq.connections import RedisSettings

async def download_content(ctx, url):
    session: ClientSession = ctx['session']
    async with session.get(url) as response:
        content = await response.text()
        print(f'{url}: {content:.80}...')
    return len(content)

async def startup(ctx):
    ctx['session'] = ClientSession()

async def shutdown(ctx):
    await ctx['session'].close()

async def main():
    redis = await create_pool(RedisSettings())
    for url in ('https://facebook.com', 'https://microsoft.com', 'https://github.com'):
        await redis.enqueue_job('download_content', url)

# WorkerSettings defines the settings to use when creating the work,
# it's used by the arq cli.
# For a list of available settings, see https://arq-docs.helpmanual.io/#arq.worker.Worker
class WorkerSettings:
    functions = [download_content]
    on_startup = startup
    on_shutdown = shutdown

if __name__ == '__main__':
    asyncio.run(main())
Ramon Dias
  • 835
  • 2
  • 12
  • 23
0

Following up on @sytech's answer: what he suggested is now supported in RQ after the introduction of this PR: https://github.com/rq/rq/pull/1405 You don't need to do anything extra, as long as your job function is an async coroutine (async def get_prices).

Note however that this doesn't mean that the worker is asynchronous, but rather that it can run job functions that are coroutines: as expected, it will block until the coroutine is done without doing anything else. The coroutine is run asynchronously.

Gera Zenobi
  • 1,344
  • 13
  • 7