This is more of a comment, but worth pointing out.
The reason in general is indeed what talonmies commented, but you are summing up the numbers incorrectly. Let's see what happens when tensors are moved to GPU (I tried this on my PC with RTX2060 with 5.8G usable GPU memory in total):
Let's run the following python commands interactively:
Python 3.8.10 (default, Sep 28 2021, 16:10:42)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> a = torch.zeros(1).cuda()
>>> b = torch.zeros(500000000).cuda()
>>> c = torch.zeros(500000000).cuda()
>>> d = torch.zeros(500000000).cuda()
The following are the outputs of watch -n.1 nvidia-smi
:
Right after torch
import:
| 0 N/A N/A 1121 G /usr/lib/xorg/Xorg 4MiB |
Right after the creation of a
:
| 0 N/A N/A 1121 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 14701 C python 1251MiB |
As you can see, you need 1251MB
to get pytorch to start using CUDA, even if you only need a single float.
Right after the creation of b
:
| 0 N/A N/A 1121 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 14701 C python 3159MiB |
b
needs 500000000*4 bytes = 1907MB
, this is the same as the increment in memory used by the python process.
Right after the creation of c
:
| 0 N/A N/A 1121 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 14701 C python 5067MiB |
No surprise here.
Right after the creation of d
:
| 0 N/A N/A 1121 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 14701 C python 5067MiB |
No further memory allocation, and the OOM error is thrown:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA out of memory. Tried to allocate 1.86 GiB (GPU 0; 5.80 GiB total capacity; 3.73 GiB already allocated; 858.81 MiB free; 3.73 GiB reserved in total by PyTorch)
Obviously:
- The "already allocated" part is included in the "reserved in total by PyTorch" part. You can't sum them up, otherwise the sum exceeds the total available memory.
- The minimum memory required to get pytorch running on GPU (
1251M
) is not included in the "reserved in total" part.
So in your case, the sum should consist of:
- 792MB (reserved in total)
- 1251MB (minimum to get pytorch running on GPU, assuming this is the same for both of us)
- 5.13GB (free)
- 168+363+161=692MB (other processes)
They sum up to approximately 7988MB=7.80GB, which is exactly you total GPU memory.