I am loading NLP models in GPU to do inferencing. But once the inference is over the GPU does not deallocate its memory:
But then the command ps -a | grep python
gave me
How do I solve this issue?
I am loading NLP models in GPU to do inferencing. But once the inference is over the GPU does not deallocate its memory:
But then the command ps -a | grep python
gave me
How do I solve this issue?
I'm having a similar problem, a pytorch process on the GPU became zombie and left GPU memory used. Furthermore, in my case the process showed 100% usage in the GPU (GPU-util
in the nvidia-smi
output). The only solution I have found so far is rebooting the system.
In case you want to try other solutions, I tried before rebooting (without succeed):
init
(pid=1). init
should reap zombie processes automatically, but this did not happen in my case (the process could still be found with ps
, and the gpu memory was not freed).SIGCHLD
to init
(command: kill -17 1
), to force reaping, but init
still did not reap the process, and the gpu memory remained being used.fuser -v /dev/nvidia*
, but no other python processes were found in my case (other than the original zombie process)./dev/nvidia0
, by running fuser -k /dev/nvidia0
. This did not affect the zombie process.nvidia-smi --gpu-reset -i <device>
, but this throwed device is currently being used by one or more other processes... Please first kill all processes using this device...
At the end, the only solution was rebooting the system.
I'm not sure what caused the error in the first place. I had a pytorch script training in a single GPU, and I have used the same script many times without issue. I used a Dataloader
using num_workers=5
, which I suspect may have been the culprit, but I cannot be sure. The process suddenly just hang, without throwing an exception or anything, and left the GPU unusable.
I'm using versions: pytorch 1.7.1+cu110
, nvidia-driver 455.45.01, running in Ubuntu 18.04
I killed all python processes (pkill python), and zombies are no more on the GPU. I was using torch.