7

After a couple days, my celery service will repeat a task over and over indefinitely. This is somewhat difficult to reproduce, but happens regularly once a week or more frequently depending on the tasks volume being processed.

I will appreciate any tips on how to get more data about this issue, since I don't know how to trace it. When it occurs, restarting celery will solve it temporarily.

I have one celery node running with 4 workers (version 3.1.23). Broker and result backends are on Redis. I'm posting to one queue only and I don't use celery beat.

The config in Django's setting.py is:

BROKER_URL = 'redis://localhost:6380'
CELERY_RESULT_BACKEND = 'redis://localhost:6380'

Relevant part of the log:

[2016-05-28 10:37:21,957: INFO/MainProcess] Received task: painel.tasks.indicar_cliente[defc87bc-5dd5-4857-9e45-d2a43aeb2647]
[2016-05-28 11:37:58,005: INFO/MainProcess] Received task: painel.tasks.indicar_cliente[defc87bc-5dd5-4857-9e45-d2a43aeb2647]
[2016-05-28 13:37:59,147: INFO/MainProcess] Received task: painel.tasks.indicar_cliente[defc87bc-5dd5-4857-9e45-d2a43aeb2647]
...
[2016-05-30 09:27:47,136: INFO/MainProcess] Task painel.tasks.indicar_cliente[defc87bc-5dd5-4857-9e45-d2a43aeb2647] succeeded in 53.33468166703824s: None
[2016-05-30 09:43:08,317: INFO/MainProcess] Task painel.tasks.indicar_cliente[defc87bc-5dd5-4857-9e45-d2a43aeb2647] succeeded in 466.0324719119817s: None
[2016-05-30 09:57:25,550: INFO/MainProcess] Task painel.tasks.indicar_cliente[defc87bc-5dd5-4857-9e45-d2a43aeb2647] succeeded in 642.7634702899959s: None

Tasks are sent by user request with:

tasks.indicar_cliente.delay(indicacao_db.id)

Here's the source code of the task and the celery service configuration.

Why are the tasks being received multiple times after some time the service is running? How can I get a consistent behavior?

rodorgas
  • 962
  • 2
  • 12
  • 29
  • Who is scheduling the task ? Is user triggered ? cron triggered ? you are using celery beat. Posting to one queue only ? Paste your celery config – Mauro Rocco May 31 '16 at 08:15
  • @MauroRocco Tasks are sent by user request (not scheduled and I don't use celery beat). I'm posting to one queue only. I've updated my question to include celery config and task source code. – rodorgas Jun 01 '16 at 01:36
  • Hi so you are saying that if you run this locally and you schedule only one task you will see 3 tasks executing ? Because if this is user triggered than is normal to have a task scheduled for each user requesting it. – Mauro Rocco Jun 01 '16 at 07:59
  • @MauroRocco Tasks will typically follow the desired behavior. However after a couple days the service is running, tasks are received multiple times (more than 100 times in a short time span). It doesn't seems that users are triggering the tasks because they are received with the same task ID (if users were requesting new tasks, log would show different ids) – rodorgas Jun 01 '16 at 11:39
  • Can you post the code that schedules the task ? – Mauro Rocco Jun 02 '16 at 07:56
  • @MauroRocco Please note that tasks are not scheduled, they are executed asynchronously using the `delay()` method. The code that sends a new task and the code that processes the tasks [is here](https://gist.github.com/rodorgas/71d556d69b18b018d35c1278c8d999c7) – rodorgas Jun 02 '16 at 17:38

3 Answers3

8

It might be a bit out of date, but I've faced the same problem and fixed it with Redis. Long story short, Celery waits for some time for tasks execution, and if the time has been expired it restarts the task. It is called visibility timeout. The explanation from the docs:

If a task isn’t acknowledged within the Visibility Timeout the task will be redelivered to another worker and executed. This causes problems with ETA/countdown/retry tasks where the time to execute exceeds the visibility timeout; in fact if that happens it will be executed again, and again in a loop. So you have to increase the visibility timeout to match the time of the longest ETA you’re planning to use. Note that Celery will redeliver messages at worker shutdown, so having a long visibility timeout will only delay the redelivery of ‘lost’ tasks in the event of a power failure or forcefully terminated workers.

Example of the option: https://docs.celeryproject.org/en/stable/userguide/configuration.html#broker-transport-options

Details: https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html#id1

Phuong Vu
  • 527
  • 4
  • 16
Andrii Rusanov
  • 4,405
  • 2
  • 34
  • 54
  • 1
    Just to add, there may be some fun interplay with using "acks_late". Basically, if you set that to true, the tasks only acknowledge on completion. This means longer running tasks can lead to this problem. Thanks for your answer here by the way. Super helpful. – PirateNinjas Nov 21 '19 at 15:53
1

Solved by using rabbitmq broker instead of redis.

rodorgas
  • 962
  • 2
  • 12
  • 29
0

I ran into an issue like this. Raising the Celery visibility timeout was not working.

It turns out that I was also running a Prometheus exporter that instantiated its own Celery object that used the default visibility timeout--therefore canceling out the higher timeout I had placed in my application.

If you have multiple Celery clients--whether they are for submitting tasks, processing tasks, or just observing tasks--make sure that they all have the exact same configuration.

James Mishra
  • 4,249
  • 4
  • 30
  • 35