9

I have 2 custom tasks (TaskA and TaskB), both inherit from celery.Task. Scheduler launches TaskA every now and then, and TaskA launches N times TaskB with different arguments every time. But for some reason, sometimes same TaskB, with same arguments, is being executed twice at the same time, and that causes different issues with the database.

class TaskA(celery.Task):

    def run(self, *args, **kwargs):
        objects = MyModel.objects.filter(processed=False)\
                                 .values_list('id', flat=True)
        task_b = TaskB()
        for o in objects:
            o.apply_async(args=[o, ])

class TaskB(celery.Task):

    def run(self, obj_id, *args, **kwargs):
        obj = MyModel.objects.get(id=obj_id)
        # do some stuff with obj

Things I've tried

I tried using celery.group in hopes that it will fix such issues, but all I got were errors, saying that run takes 2 arguments and none were provided.

This is how I tried to launch TaskB using celery.group:

# somewhere in TaskA
task_b = TaskB()
g = celery.group([task_b.s(id) for id in objects])
g.apply_async()

I also tried it like this:

# somewhere in TaskA
task_b = TaskB()
g = celery.group([task_b.run(id) for id in objects])
g.apply_async()

which executed the tasks right there, before g.apply_async().

Question

Does the issue come from how I launch tasks or is it something else? Is it a normal behaviour?

Additional Info

On my local machine I run celery 3.1.13 with RabbitMQ 3.3.4, and on server celery 3.1.13 runs with Redis 2.8.9. On local machine I see no such behaviour, every task is executed once. On server I see anywhere between 1 - 10 such tasks being executed twice in a row.

This is how I run celery on local machine and on server:

celery_beat: celery -A proj beat -l info

celery1: celery -A proj worker -Q default -l info --purge -n default_worker -P eventlet -c 50

celery2: celery -A proj worker -Q long -l info --purge -n long_worker -P eventlet -c 200

Workaround that works

I introduced a lock on TaskB based on what arguments it recieved. After about 10 hours of testing, I see what exactly is being executed twice, but the lock prevents collision on database. This does solve my issue, but I would still like to understand why is it happening.

Neara
  • 3,693
  • 7
  • 29
  • 40
  • Your code should work fine. I've copied [your code to one file like this](http://pastebin.com/f1gAf4R4) and all tasks has been executed after calling `TaskA().apply_async()`. Could you post your traceback to see where is an issue? – daniula Jul 24 '14 at 00:10
  • the traceback comes from the database. `MyModel` has unique constraint on 2 fields. So when the task is run the first time and creates a new object, it's all good, but then same task runs again, and tries to create same object again and throws `IntegrityError`. – Neara Jul 24 '14 at 08:25
  • With code which you posted it's impossible to replicate your issue. I think that you could try creating seperate TaskB instances for each task as it could be an issue. Try: `g = celery.group([TaskB().s(id) for id in objects])` – daniula Jul 24 '14 at 09:20
  • I tried that too with same results. The code i posted represents pretty close what are my actual tasks doing. I will switch to running celery on redis on my local machine, maybe that will give me better debug info on what is going on. – Neara Jul 24 '14 at 11:28
  • 3
    You probably want to watch this presentation http://youtu.be/3cyq5DHjymw?t=24m20s – Anthony Kong Aug 11 '14 at 11:02
  • @AnthonyKong that was actually very easy and straightforward presentation thank you! – eugene Mar 19 '15 at 02:10
  • How did you introduce a lock on TaskB? It'd be helpful for people facing the same issue. – EML Apr 05 '15 at 08:15

1 Answers1

6

Have you set the fanout_prefix and fanout_patterns as described in the Using Redis documentation for Celery? I am using Celery with Redis and I am not experiencing this problem.

Dag Høidahl
  • 7,873
  • 8
  • 53
  • 66
ThatAintWorking
  • 1,330
  • 20
  • 32
  • I experienced this same problem, with one task queueing other tasks, and those other tasks being picked up and executed multiple times. Setting the `fanout_prefix` and `fanout_patterns` as described seems to have fixed the issue. Using Celery 3.1.18 and Kombu 3.0.30 – Will Keeling Apr 24 '17 at 12:42