2

I have a celery task that launches three other celery tasks. I want these tasks to execute asynchronously and wait for them to finish before i resume the parent task. However the child tasks are running synchronosly and I don't know why. The problem started when I upgraded celery from 4.4.7 to 5.0.0

app_celery.py

@app.task(name="app_celery.scraping_process", soft_time_limit=900, time_limit=960, max_retries=3)
def scraping_process():
    sources = ["a", "b", "c"]
    job = group((company_representation.s(src) for src in sources))
    result = job.apply_async(queue="spiders", routing_key="spiders")
    while not result.ready():
        time.sleep(5)
    
@app.task(name="app_celery.company_representation", max_retries=3)
def company_representation(source: str):
    # do something
    time.sleep(60)
    

I am running celery like this:

celery -A app_celery worker -c 8 -Q spiders -n spiders@%%h
celery -A app_celery worker -c 2 -Q companies -n companies@%%h --without-mingle --without-heartbeat -Ofair

celery==5.0.0

  • 1
    `job.apply_async` will not be running synchronously. Is it possible that your celery cluster has only 1 worker ? – Kris Feb 28 '22 at 12:57
  • @Kris it should have eight because I am running the celery like this: celery -A app_celery worker -c 8 -Q spiders – iam.mattevans Feb 28 '22 at 13:21
  • Sure the tasks are being executed ? sync or async ? and is this running on Windows ? – Kris Feb 28 '22 at 13:57
  • 1
    You should refactor your task to use [Chord](https://docs.celeryproject.org/en/stable/userguide/canvas.html#chords) – DejanLekic Feb 28 '22 at 17:16
  • @DejanLekic I don't want to use Chord because I want the three child tasks to run at the same time and not one after the other, and there is no task to run after the children have finished executing. The problem is that the child tasks do not run async – iam.mattevans Mar 01 '22 at 08:52
  • @Kris I am sure they are getting executed, only sync instead of async. If I use --pool eventlet, the tasks start executing async, but I need to use the default prefork because with eventlet the scrapy code I am running in the tasks does not run correctly – iam.mattevans Mar 01 '22 at 11:07
  • 1
    @iam.mattevans - in fact Chord is a Group + Final-task chained together (maybe that is why you misunderstood how it works). - All group tasks are executed **in parallel** (if there are enough worker processes ofc)... As to what you say about the fact you do not have a task to run after grouped tasks have finished - well, make one! :) – DejanLekic Mar 01 '22 at 11:26
  • @DejanLekic that is actually a great idea and is probably a neater way to do what I am trying to do. Thank you, I will look into that. – iam.mattevans Mar 01 '22 at 13:02

2 Answers2

0

You could add the task id's to a list and then do something like:

def poll_job_status(active_jobs):
    if len(active_jobs) == 1:
        task = task.AsyncResult(active_jobs[0])
        if not task.ready():
            return active_jobs
    _new_active_jobs = []
    for taskid in active_jobs:
        task = task.AsyncResult(taskid)
        if task.state == "PENDING" or task.state == "RETRY":
            _new_active_jobs.append(taskid)
    active_jobs = _new_active_jobs
    return active_jobs

So you would iterate over the list of task IDs and check if the task is complete or not. And then if the list is empty you know that all the tasks has run and you can carry on with other operations. example usage would be:

active_tasks_list = []
active_tasks_list.append(task.delay(args).id)
while len(active_tasks_list) > 0:
   poll_job_status(active_tasks_list)
# carry on other processes

This is ideal if you have numerous tasks that you want to keep track of.

Renier
  • 1,523
  • 4
  • 32
  • 60
0

You can try the Celery group to invoke multiple tasks in parallel and wait for their results.

@app.task(name="app_celery.scraping_process", soft_time_limit=900, time_limit=960, max_retries=3)
def scraping_process():
    sources = ["a", "b", "c"]
    tasks =[]
    for src in sources:
        tasks.append(company_representation.s(src))

    # create a group with all the tasks
    job = group(tasks)
    result = job.apply_async(queue="spiders", routing_key="spiders")
    ret_val = result.get(disable_sync_subtasks=False)
    return ret_val
    
@app.task(name="app_celery.company_representation", max_retries=3)
def company_representation(source: str):
    # do something
    time.sleep(60)

Reference: http://ask.github.io/celery/userguide/groups.html#groups

dassum
  • 4,727
  • 2
  • 25
  • 38