5

In my case still I'm trying to understand something about it. Running long tasks (they take from 20 mins to 2 hours) I have a weird scenario in which my celery worker, after a while (15-20 mins) pass from status=online to offline, however they still have active=1.

After this I see how the same task is started in another celery worker. Ant the process repeat. This happens again until I have the same task running three times at the same time in different workers. All of them offline with active=1 after a while

  • What does it mean to have a celery worker status=offline with active=1?
  • What can be the reason to have a worker on this state?
Jasmijn
  • 9,370
  • 2
  • 29
  • 43
Francisco Albert
  • 1,577
  • 2
  • 17
  • 34
  • 1
    According to the documentation (https://docs.celeryproject.org/en/stable/userguide/monitoring.html?highlight=offline#worker-offline), a worker sends a heartbeat every minute. If no heartbeat has been received for 2 minutes, the worker is considered to be offline, i.e. disconnected from the broker. According to the Worker class (https://docs.celeryproject.org/en/stable/_modules/celery/events/state.html#Worker), the active attr is set on `__init__`. It looks like active is not a good indication for the workers state. Could it be possible that the worker failed or disconnected from the broker? – Jonas Jul 22 '20 at 11:29
  • 1
    Failed no because I can see the expected output in Kibana from that worker, so this tells me that the task keeps running properly. It sounds like it disconnected, but....why it wouldn't send a heartbeat?..... – Francisco Albert Jul 22 '20 at 12:04
  • 1
    This sounds like an issue that is worth debugging. Maybe the worker is "too busy" to send a heartbeat. This could be due some blocking command that is run as part of the task (i.e. some network operation)? I'm not well-versed with the internals of Celery. Do Celery workers spawn a separate thread for each task to avoid blocking the communication between worker and broker? – Jonas Jul 22 '20 at 12:30
  • 1
    Your issue could be related to: https://github.com/celery/celery/issues/5157 and https://github.com/celery/celery/issues/4758 – Jonas Jul 22 '20 at 12:33

0 Answers0