11

I have converted a standalone batch job to use celery for dispatching the work to be done. I'm using RabbitMQ. Everything is running on a single machine and no other processes are using the RabbitMQ instance. My script just creates a bunch of tasks which are processed by workers.

Is there a simple way to measure the time from the start of my script until all tasks are finished? I know that this a bit complicated by design when using message queues. But I don't want to do it in production, just for testing and getting a performance estimation.

Achim
  • 15,415
  • 15
  • 80
  • 144

2 Answers2

38

You could use celery signals, functions registered will be called before and after a task is executed, it is trivial to measure elapsed time:

from time import time
from celery.signals import task_prerun, task_postrun


d = {}

@task_prerun.connect
def task_prerun_handler(signal, sender, task_id, task, args, kwargs, **extras):
    d[task_id] = time()


@task_postrun.connect
def task_postrun_handler(signal, sender, task_id, task, args, kwargs, retval, state, **extras):
    try:
        cost = time() - d.pop(task_id)
    except KeyError:
        cost = -1
    print task.__name__, cost
georgexsh
  • 15,984
  • 2
  • 37
  • 62
  • @vikas-prasad `kwargs` is for receiving "task keyword arguments", added `**extras` for celery 4 compatiability. – georgexsh Aug 14 '18 at 09:52
7

You could use a chord by adding a fake task at the end that would be passed the time at which the tasks were sent, and that would return the difference between current time and the time passed when executed.

import celery
import datetime
from celery import chord

@celery.task
def dummy_task(res=None, start_time=None):
    print datetime.datetime.now() - start_time

def send_my_task():
    chord(my_task.s(), dummy_task.s(start_time=datetime.datetime.now()).delay()

send_my_task sends the task that you want to profile along with a dummy_task that would print how long it took (more or less). If you want more accurate numbers, I suggest passing the start_time directly to your tasks, and using the signals.

Maciej Gol
  • 15,394
  • 4
  • 33
  • 51
  • 7
    But dummy_task will be another task and can be executed on different worker or significant later, than original task. – homm Nov 19 '14 at 15:14
  • @homm, yes, but the OP explicitly stated that there is a single worker node, and no other processes are using the RabbitMQ node, thus only tasks that we are measuring are calculated. The only delay comes from receiving the time measuring tasks for the last time, but the chord is on a 1-second periodic timer. – Maciej Gol Nov 19 '14 at 15:39
  • 1
    No other processes, but not "no other tasks", right? If there is no free worker processes, dummy_task will wait. – homm Nov 19 '14 at 19:35
  • @homm, yes, but the OP said that no other process than his script uses the queue, and the OP wants to measure time from start of the script up to when _all_ tasks have finished. – Maciej Gol Nov 19 '14 at 19:40