0

I use theano for some deep learning experiments. I have killed a 3 weeks running process by ctrl+c, to start a new process.

As I see, although I have killed the process, the gpu memory is not released. According to nvidia-smi, the memory is free, except 23MB small usage. I use Tesla k40.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40m          Off  | 0000:85:00.0     Off |                    0 |
| N/A   24C    P8    21W / 235W |     23MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      2873    G   /usr/lib/xorg/Xorg                              23MiB |
+-----------------------------------------------------------------------------+

But in reality, when I try to run even very small datasets, I get memory error. If it would be only 23 MB usage, it shouldn't be a problem at all.

I don't have sudo privileges on the machine I am using. How can I fix this problem?

yusuf
  • 3,591
  • 8
  • 45
  • 86
  • I'm reasonably sure that `sudo` won't help here. Even if you can `sudo`, you can't (trivially) free memory usage - in the graphics card or main memory. You would need to know exactly which allocations can be freed (as they are guaranteed to not be used again). But 23MB is not a high usage if you have 11GB available... – Mats Petersson Dec 19 '16 at 16:12
  • 23 MB is fine, but the current memory usage is more than 23 MB. – yusuf Dec 19 '16 at 16:13
  • Assuming you no longer needed the Xorg session and are using ssh without -X, why don't you kill PID 2873? kill -9 2873 – Rick Lentz Dec 21 '16 at 03:53
  • Are you using batching? Can you try running Theano with device=cpu and watch the host memory consumption? What happens to the GPU memory allocation when you start training your network? watch -n 0.1 nvidia-smi – Rick Lentz Dec 21 '16 at 04:00

0 Answers0