4

I don't understand python rq that much and I just started learning about it.

There is a task_a that takes 3 minutes to finish processing.

@job
def task_a():
    time.sleep(180)
    print('done processing task_a')

def call_3_times():
    task_a.delay()
    task_a.delay()
    task_a.delay()

From what I observed, task_a will be executed one by one from the queue. After the first call is finished, then proceeds to the next call and so on. Total time taken is 3 minutes x 3 = 9 minutes

How can I make each task_a in call_3_times function be executed in parallel? so the time taken is lesser than 9 minutes probably 3 minutes and 10 sec (just an example it would probably be faster than that).

Probably I need to spawn 3 rq workers yes it does work faster and like parallel. But what if I need to call it 2000 times. Should I spawn 2000 rq workers? I mean, there must be a better way to do that.

Minah
  • 81
  • 1
  • 9
  • 1
    You're correct that each worker will have to finish processing `task_a` before doing the next task. You probably only want a number of workers equal to the number of cores you have (e.g. 2 or 4 on a desktop/laptop). Please describe what `task_a` is actually doing and someone may be able to propose a better solution. – supersam654 Apr 28 '17 at 02:54

2 Answers2

6

If you need to call the task 2000 times, you can create 2000 jobs in the queue, and have only 3 workers to work in parallel 3 at a time until all jobs are done.

The number of workers depends on the spec of your server. It's obviously not practical to initiate 2000 workers in an attempt to parallel all jobs at once. If you really need to process thousands of jobs at once, you have two choices:

  1. Distribute the jobs on a farm of workers (multiple servers)
  2. Add concurrency within each worker function, so that each worker spawns new threads or processes to do the actual work.

Choice #2 depends on what type of work you're doing (I/O or CPU bound). If it's IO bound and thread-safe, use threads in worker function, otherwise, use multiprocessing with the trade-off in increased resource dependency. However, if you have the resource to spawn off multiple processes, why not just increase the worker count at the first place which has less complexity.

So to summarize, base on your task type. If it's I/O bound, you can do #1/#2. If it's CPU bound, your choice is limited to #1 with respect to the spec of your server.

nafooesi
  • 1,097
  • 12
  • 18
3

If you use rq, the answer is yes, you need to span more workers to perform concurrency.

This is from the rq website: http://python-rq.org/docs/workers/

Each worker will process a single job at a time. Within a worker, there is no concurrent processing going on. If you want to perform jobs concurrently, simply start more workers.


If want find a solution, try celery: http://docs.celeryproject.org

Then you can do something like:

celery worker --concurrency=10

It provides worker level concurrency, so you don't need to spwn 20000 worker or something.

gushitong
  • 1,898
  • 16
  • 24
  • so how do people who use python rq solve this kind of issue? – Minah Apr 28 '17 at 18:33
  • 2
    I think your explanation is not comprehensive enough. Take a look here http://python-rq.org/docs/workers/ and find "The ability to use different concurrency models such as multiprocessing or gevent." – Minah Apr 28 '17 at 19:49
  • @Minah were you able to achieve this multiprocessing and concurrency? – Ronnie Jun 22 '19 at 12:12