4

Context: I'm using Celery 4.4.0 with python 2. My system does the same job every second. I deploy my celery with Gcloud Kubernetes. I have 1 pod of Redis as a broker & 2 replica pods of the celery app. The celery pods are identical: they use a same codebase and same broker. Each pod is a beat & worker.

Problem: After running for several days, tasks are not being triggered anymore although the beats still queue tasks every second. If I reset the pods, they will work fine for the next few days then be stuck in the same way again.

My code:

celery worker \
  --app scheduler \
  --without-mingle \
  --without-gossip \
  --loglevel=DEBUG \
  --queues my_queue \
  --concurrency=1 \
  --max-tasks-per-child=1 \
  --beat \
  --pool=solo
app = Celery(fixups=[])
app.conf.update(
    CELERYD_HIJACK_ROOT_LOGGER=False,
    CELERYD_REDIRECT_STDOUTS=False,
    CELERY_TASK_RESULT_EXPIRES=1200,
    BROKER_URL='redis://redis.default.svc.cluster.local:6379/0',
    BROKER_TRANSPORT='redis',
    CELERY_RESULT_BACKEND='redis://redis.default.svc.cluster.local:6379/0',
    CELERY_TASK_SERIALIZER='json',
    CELERY_ACCEPT_CONTENT=['json'],
    CELERYBEAT_SCHEDULE={
        'my_task': {
            'task': 'tasks.my_task',
            'schedule': 1.0, # every 1 sec
            'options': {'queue': 'my_queue'},
        }
    }
)


@task(
    name='tasks.my_task',
    soft_time_limit=config.ENRCelery.max_soft_time_limit,
    time_limit=config.ENRCelery.max_time_limit,
    bind=True)
def my_task(self):
    print "TRIGGERED"

Logs when tasks are stuck:

# every second

beat: Waking up now. | beat:633
Scheduler: Sending due task my_task (tasks.my_task) | beat:271
tasks.my_task sent. id->97d7837d-3d8f-4c1f-b30e-d2cac0013531

I think having a beat & a worker for each pod is not the problem because I don't care if the tasks are triggered duplicatedly. That doesn't matter to me.

Do you have any clue? Any help will be appreciated. Thank you in advance.

1 Answers1

1

We use a Celery based 3rd party app with Azure Redis as a broker. After a while, the app is simply not scheduling new tasks, it goes into this idle state when you only see the beat logging every 1 minutes and nothing else is happening. Restarting the workers is the only workaround we found so far - but it is far from ideal...

  • This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](https://stackoverflow.com/questions/ask). To get notified when this question gets new answers, you can [follow this question](https://meta.stackexchange.com/q/345661). Once you have enough [reputation](https://stackoverflow.com/help/whats-reputation), you can also [add a bounty](https://stackoverflow.com/help/privileges/set-bounties) to draw more attention to this question. - [From Review](/review/late-answers/33356368) – chrslg Dec 11 '22 at 09:46