Context:
I am trying to write a graceful shutdown for my celery application
The logic is that when I receive a SIGTERM signal, I stop (revoke) my currently running tasks by the celery and then exit the main worker process
I am trying to achieve this by registering a SIGTERM handler in "worker_ready" celery signal handler
(For dev testing, I do not exit or raise at the end of SIGTERM handler (sigterm_handler), so we do not kill worker process at the end)
Problem:
To obtain the list of tasks currently being run by celery work, I use "celery.control.inspect.active" method
This method works as per expectation before sending the SIGTERM signal
But as soon as I send SIGTERM, I lose the worker stats
I am unable to get output for inspect commands for the node
Debugging:
(there are multiple workers, focus on 'asyncsyncsystem@taskworker-async-sync-system-6f48fd489-szhxr'):
- Before sending TERM signal
>>> celery.control.inspect(timeout=2).active()
{
'fast@taskworker-fast-5f9d8b9849-wx9z4': [],
'asyncsyncsystem@taskworker-async-sync-system-6f48fd489-szhxr': [{
'id': '63eb332cf88bdc58f865d48e',
'name': 'execute_task',
'args': [],
'kwargs': {},
'type': 'execute_task',
'hostname': 'asyncsyncsystem@taskworker-async-sync-system-6f48fd489-szhxr',
'time_start': 1676358444.9854753,
'acknowledged': True,
'delivery_info': {
'exchange': '',
'routing_key': 'sync',
'priority': 10,
'redelivered': False
},
'worker_pid': 191
}],
'realtime@taskworker-realtime-8658b56d5b-dwxw8': [],
'fast@taskworker-fast-5f9d8b9849-x8lmz': []
}
- JUST after sending TERM signal
>>> celery.control.inspect(timeout=2).active()
{
'fast@taskworker-fast-5f9d8b9849-wx9z4': [],
'realtime@taskworker-realtime-8658b56d5b-dwxw8': [],
'fast@taskworker-fast-5f9d8b9849-x8lmz': []
}
- After sigterm_handler finishes execution
>>> celery.control.inspect(timeout=2).active()
{
'fast@taskworker-fast-5f9d8b9849-wx9z4': [],
'asyncsyncsystem@taskworker-async-sync-system-6f48fd489-szhxr': [],
'realtime@taskworker-realtime-8658b56d5b-dwxw8': [],
'fast@taskworker-fast-5f9d8b9849-x8lmz': []
}
Sample Code (stripped):
from celery.platforms import signals
def sigterm_handler(*args, **kwargs):
active_tasks = celery.control.inspect(timeout=2).active()
print(active_tasks)
@worker_ready.connect
def worker_ready(**kwargs):
signals['TERM'] = sigterm_handler
# signal.signal(signal.SIGTERM, sigterm_handler)
I tried debugging this by exec-ing into the K8s pod.
I ran celery inspect commands in the pod celery shell and was able to verify that when we send TERM signal, we loose the details of celery node/host.