11

The nvidia-smi shows following indicating 3.77GB utilized on GPU0 but no processes are listed for GPU0:

(base) ~/.../fast-autoaugment$ nvidia-smi
Fri Dec 20 13:48:12 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:03:00.0 Off |                  N/A |
| 23%   34C    P8     9W / 250W |   3771MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:84:00.0  On |                  N/A |
| 38%   62C    P8    24W / 250W |   2295MiB / 12188MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      1910      G   /usr/lib/xorg/Xorg                           105MiB |
|    1      2027      G   /usr/bin/gnome-shell                          51MiB |
|    1      3086      G   /usr/lib/xorg/Xorg                          1270MiB |
|    1      3237      G   /usr/bin/gnome-shell                         412MiB |
|    1     30593      G   /proc/self/exe                               286MiB |
|    1     31849      G   ...quest-channel-token=4371017438329004833   164MiB |
+-----------------------------------------------------------------------------+

Similarly nvtop shows same GPU RAM utilization but the processes it lists shows TYPE=Compute and if you try to kill PIDs it shows then you get error:

(base) ~/.../fast-autoaugment$ kill 27761
bash: kill: (27761) - No such process

How to reclaim GPU RAM occupied by apparently ghost processes?

Shital Shah
  • 63,284
  • 17
  • 238
  • 185

1 Answers1

24

Use following command to get insight into ghost processes occupying GPU RAM:

sudo fuser -v /dev/nvidia*

In my case, output is:

(base) ~/.../fast-autoaugment$ sudo fuser -v /dev/nvidia*
                     USER        PID ACCESS COMMAND
/dev/nvidia0:        shitals     517 F.... nvtop
                     root       1910 F...m Xorg
                     gdm        2027 F.... gnome-shell
                     root       3086 F...m Xorg
                     shitals    3237 F.... gnome-shell
                     shitals   27808 F...m python
                     shitals   27809 F...m python
                     shitals   27813 F...m python
                     shitals   27814 F...m python
                     shitals   28091 F...m python
                     shitals   28092 F...m python
                     shitals   28096 F...m python

This shows processes that nvidia-smi as well as nvtop fails to shows. After I killed all of the python processes, the GPU RAM was freed up.

Another thing to try is to reset GPU using the command:

sudo nvidia-smi --gpu-reset -i 0
Shital Shah
  • 63,284
  • 17
  • 238
  • 185