7

I've a python celery-redis queue processing uploads and downloads worth gigs and gigs of data at a time.

Few of the uploads takes upto few hours. However once such a task finishes, I'm witnessing this bizarre celery behaviour that the celery scheduler is rerunning the just concluded task again by sending it again to the worker (I'm running a single worker) And it just happened 2times on the same task!

Can someone help me know why is this happening and how can I prevent it?

The tasks are definitely finishing cleanly with no errors reported just that these are extremely long running tasks.

user2252999
  • 183
  • 2
  • 10

1 Answers1

14

I recently ran into this issue, and eventually figured out that tasks were running multiple times because of a combination of task prefetching and tasks exceeded the visibility timeout. Tasks are acknowledged right before they're executed (unless you set ACKS_LATE=True), and by default 4 tasks are prefetched per process. The first task will be acknowledged before execution, but if it takes over an hour to execute then the other prefetched tasks will be delivered to another worker where it will be executed an additional time (or in your case, executed an additional time by the same worker).

You can solve by increasing the visibility timeout to something longer than the longest possible runtime of your tasks:

BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 3600*10}  # 10 hours

You could also set PREFETCH_MULTIPLIER=1 to disable prefetching so that long running tasks don't keep other tasks from being acknowledged.

SagarM
  • 311
  • 4
  • 14
Jason V.
  • 406
  • 6
  • 7
  • Actually, the prefetch multiplier can't be disabled unless you are using `acks_late`. If `acks_late` is disabled, the main process still reserves 1 additional task for every worker process (`--concurrency`). Moreover, it can reserve unlimited scheduled tasks (`celery inspect scheduled`) until it encounters enough tasks to feed its children (`celery inspect active`) and fill the main queue up to the prefetch multiplier (`celery inspect reserved`). See https://github.com/celery/celery/issues/6500 – Emilio Nov 18 '21 at 10:37
  • Note that the `visibility_timeout` only applies to the Redis transport. A similar param for RabbitMQ is the [`consumer_timeout`](https://www.rabbitmq.com/consumers.html#acknowledgement-timeout) – Emilio Nov 18 '21 at 10:39
  • @Emilio Facing the same problem, celery re-running the same tasks over and over again indefinitely. Can you please tell me how to set consumer_timeout as I am using rabitmq. – codemastermind Jul 25 '22 at 19:41
  • Hi @codemastermind, the RabbitMQ official documentation for [`consumer_timeout`](https://www.rabbitmq.com/consumers.html#acknowledgement-timeout) explains how to do that. Basically, you should add the `consumer_timeout` parameter to your `rabbitmq.conf` file (and likely restart RabbitMQ or make it reload the config somehow). Notice that this has nothing to do with your Python or Django setup. – Emilio Jul 26 '22 at 21:21