CUDA OOM - But the numbers don't add upp?

Question

I am trying to train a model using PyTorch. When beginning model training I get the following error message:

RuntimeError: CUDA out of memory. Tried to allocate 5.37 GiB (GPU 0; 7.79 GiB total capacity; 742.54 MiB already allocated; 5.13 GiB free; 792.00 MiB reserved in total by PyTorch)

I am wondering why this error is occurring. From the way I see it, I have 7.79 GiB total capacity. The numbers it is stating (742 MiB + 5.13 GiB + 792 MiB) do not add up to be greater than 7.79 GiB. When I check nvidia-smi I see these processes running

|    0   N/A  N/A      1047      G   /usr/lib/xorg/Xorg                168MiB |
|    0   N/A  N/A      5521      G   /usr/lib/xorg/Xorg                363MiB |
|    0   N/A  N/A      5637      G   /usr/bin/gnome-shell              161MiB |

I realize that summing all of these numbers might cut it close (168 + 363 + 161 + 742 + 792 + 5130 = 7356 MiB) but this is still less than the stated capacity of my GPU.

The compute and graphics runtimes are not free. There is a lot of stuff under the hood that requires memory allocations to work before the first line of code executes or the first user memory allocation is made: see https://stackoverflow.com/a/8686409/681865 — talonmies, Nov 23 '21 at 02:48
Ok, so there is something else taking up GPU somewhere that is not listed in any of the above locations? — 100Large, Nov 23 '21 at 02:49
Yes, in your Pytorch model and the CUDA runtime it needs to work properly — talonmies, Nov 23 '21 at 03:00
You are forgetting about memory fragmentation, this just says there is no 5.37 GB free and contiguous memory section. — Dr. Snoopy, Nov 23 '21 at 15:15

ihdv · Accepted Answer · 2021-11-23T06:13:39.380

This is more of a comment, but worth pointing out.

The reason in general is indeed what talonmies commented, but you are summing up the numbers incorrectly. Let's see what happens when tensors are moved to GPU (I tried this on my PC with RTX2060 with 5.8G usable GPU memory in total):

Let's run the following python commands interactively:

Python 3.8.10 (default, Sep 28 2021, 16:10:42) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> a = torch.zeros(1).cuda()
>>> b = torch.zeros(500000000).cuda()
>>> c = torch.zeros(500000000).cuda()
>>> d = torch.zeros(500000000).cuda()

The following are the outputs of watch -n.1 nvidia-smi:

Right after torch import:

|    0   N/A  N/A      1121      G   /usr/lib/xorg/Xorg                  4MiB |

Right after the creation of a:

|    0   N/A  N/A      1121      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A     14701      C   python                           1251MiB |

As you can see, you need 1251MB to get pytorch to start using CUDA, even if you only need a single float.

Right after the creation of b:

|    0   N/A  N/A      1121      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A     14701      C   python                           3159MiB |

b needs 500000000*4 bytes = 1907MB, this is the same as the increment in memory used by the python process.

Right after the creation of c:

|    0   N/A  N/A      1121      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A     14701      C   python                           5067MiB |

No surprise here.

Right after the creation of d:

|    0   N/A  N/A      1121      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A     14701      C   python                           5067MiB |

No further memory allocation, and the OOM error is thrown:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA out of memory. Tried to allocate 1.86 GiB (GPU 0; 5.80 GiB total capacity; 3.73 GiB already allocated; 858.81 MiB free; 3.73 GiB reserved in total by PyTorch)

Obviously:

The "already allocated" part is included in the "reserved in total by PyTorch" part. You can't sum them up, otherwise the sum exceeds the total available memory.
The minimum memory required to get pytorch running on GPU (1251M) is not included in the "reserved in total" part.

So in your case, the sum should consist of:

792MB (reserved in total)
1251MB (minimum to get pytorch running on GPU, assuming this is the same for both of us)
5.13GB (free)
168+363+161=692MB (other processes)

They sum up to approximately 7988MB=7.80GB, which is exactly you total GPU memory.

CUDA OOM - But the numbers don't add upp?

1 Answers1

Linked