0

Context:

  • I am trying to write a graceful shutdown for my celery application

  • The logic is that when I receive a SIGTERM signal, I stop (revoke) my currently running tasks by the celery and then exit the main worker process

  • I am trying to achieve this by registering a SIGTERM handler in "worker_ready" celery signal handler

  • (For dev testing, I do not exit or raise at the end of SIGTERM handler (sigterm_handler), so we do not kill worker process at the end)


Problem:

To obtain the list of tasks currently being run by celery work, I use "celery.control.inspect.active" method

  • This method works as per expectation before sending the SIGTERM signal

  • But as soon as I send SIGTERM, I lose the worker stats

  • I am unable to get output for inspect commands for the node


Debugging:

(there are multiple workers, focus on 'asyncsyncsystem@taskworker-async-sync-system-6f48fd489-szhxr'):

  1. Before sending TERM signal
>>> celery.control.inspect(timeout=2).active()

{
    'fast@taskworker-fast-5f9d8b9849-wx9z4': [],
    'asyncsyncsystem@taskworker-async-sync-system-6f48fd489-szhxr': [{
        'id': '63eb332cf88bdc58f865d48e',
        'name': 'execute_task',
        'args': [],
        'kwargs': {},
        'type': 'execute_task',
        'hostname': 'asyncsyncsystem@taskworker-async-sync-system-6f48fd489-szhxr',
        'time_start': 1676358444.9854753,
        'acknowledged': True,
        'delivery_info': {
            'exchange': '',
            'routing_key': 'sync',
            'priority': 10,
            'redelivered': False
        },
        'worker_pid': 191
    }],
    'realtime@taskworker-realtime-8658b56d5b-dwxw8': [],
    'fast@taskworker-fast-5f9d8b9849-x8lmz': []
}
  1. JUST after sending TERM signal
>>> celery.control.inspect(timeout=2).active()

{
      'fast@taskworker-fast-5f9d8b9849-wx9z4': [],
      'realtime@taskworker-realtime-8658b56d5b-dwxw8': [],
      'fast@taskworker-fast-5f9d8b9849-x8lmz': []
}
  1. After sigterm_handler finishes execution
>>> celery.control.inspect(timeout=2).active()

{
    'fast@taskworker-fast-5f9d8b9849-wx9z4': [],
    'asyncsyncsystem@taskworker-async-sync-system-6f48fd489-szhxr': [],
    'realtime@taskworker-realtime-8658b56d5b-dwxw8': [],
    'fast@taskworker-fast-5f9d8b9849-x8lmz': []
}

Sample Code (stripped):

from celery.platforms import signals


def sigterm_handler(*args, **kwargs):
    active_tasks = celery.control.inspect(timeout=2).active()

    print(active_tasks)


@worker_ready.connect
def worker_ready(**kwargs):
    signals['TERM'] = sigterm_handler
    # signal.signal(signal.SIGTERM, sigterm_handler)

I tried debugging this by exec-ing into the K8s pod.

I ran celery inspect commands in the pod celery shell and was able to verify that when we send TERM signal, we loose the details of celery node/host.

TCSHM
  • 31
  • 2

0 Answers0