2

I had a workload which have 16 instances, also they can communicate with each other (verified by ping). Each of them was running a long time task and started like this:

nohup celery worker -A tasks.workers --loglevel=INFO --logfile=/dockerdata/log/celery.log --concurrency=7 >/dev/null 2>&1 &

However, after a while, there will always be a few instances of celery that will stop running, because normally the log directory will save every day's logs. I checked the last day's logs for these instances and found the following information:

worker exited by signal SIGKILL

[2021-07-23 09:04:24,270: ERROR/MainProcess] Process 'ForkPoolWorker-19773' pid:2846586 exited with 'signal 9 (SIGKILL)'
[2021-07-23 09:04:24,281: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 79074.')
Traceback (most recent call last):
  File "/data/anaconda3/lib/python3.8/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
    raise WorkerLostError(
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 79074.

missed hearbeat from...

[2021-07-30 10:24:26,815: INFO/MainProcess] missed heartbeat from celery@instance-1

I suspect that the celery stop has something to do with the above two messages. Can anyone offer some solutions to this problem?

enlighten
  • 21
  • 2
  • 1
    Upvoted, but I feel that the question lacks necessary details. What else the logs contain? – Suthiro Aug 11 '21 at 05:47
  • Can you specify what details are needed? Other information is mainly about the process of the task, and errors in the code itself are caught and recorded without affecting the program. I think the main problem lies in those two messages. – enlighten Aug 11 '21 at 06:44
  • These messages are the result. Maybe there are other messages in the log indicating what has went wrong. – Suthiro Aug 11 '21 at 07:25
  • I searched for a long time, but found no useful information. What are the possible causes of the two problems mentioned above? – enlighten Aug 11 '21 at 07:48
  • Is this [answer](https://stackoverflow.com/questions/22805079/celery-workerlosterror-worker-exited-prematurely-signal-9-sigkill) relevant to your question? – Suthiro Aug 11 '21 at 09:22

0 Answers0