5

When I use nvidia-smi, I found nearly 20GB GPU Memory is missing somewhere (total listed processes took 17745MB, meanwhile Memory-Usage is 37739MB):

enter image description here

Then I use nvitop, you can see No Such Process has actually taken my GPU resources. However, I cannot kill this PID:

>>> sudo kill -9 118238
kill: (118238): No such process

enter image description here

How can I get rid of this ghost process without interupting others?

nguyendhn
  • 423
  • 1
  • 6
  • 19

1 Answers1

12

I have found the solution in this answer: https://stackoverflow.com/a/59431785/6563277.

First, I run sudo fuser -v /dev/nvidia* to see all processes are using my GPU RAM that nvidia-smi has failed to show.

Then, I saw some "ghost" Python processes. And after killing it, the GPU RAM was free up.

nguyendhn
  • 423
  • 1
  • 6
  • 19