I am trying to calculate the memory footprint of my fine-tuned model during the inference time. I wish to calculate how much RAM memory the model will need on a system with no GPU? and how much GPU memory this model needs on a system with GPU?
While calculating this, I observed that when I transfer my fine-tuned (pytorch) model from CPU to GPU, some additional memory on RAM is allocated for this. I am not able to understand why that happens. This answer is not comprehensive enough.
To replicate the problem use this code:
import time
import torch
import psutil
def stathere():
av = []
if torch.cuda.is_available():
av.append(torch.cuda.memory_allocated(torch.device("cuda"))/(1024*1024*1024))
else:
av.append(0)
av.append(psutil.virtual_memory().available/(1024*1024*1024))
a = time.time()
return av, a
def statnow(av, a):
if torch.cuda.is_available():
print("Memory taken on GPU", round(torch.cuda.memory_allocated(torch.device("cuda"))/(1024*1024*1024)-av[0],3), "GB")
print("Memory taken on RAM", round(av[1]-(psutil.virtual_memory().available/(1024*1024*1024)),3),"GB")
print(round(time.time()-a), "seconds taken")
return
av, a = stathere()
print('Tensor on RAM')
g = torch.rand(20000,20000)
statnow(av,a)
del g
av, a = stathere()
print('Tensor transfered on GPU')
g = torch.rand(20000,20000).to(torch.device("cuda:0"))
statnow(av,a)
Output
Tensor on RAM
Memory taken on GPU 0.0 GB
Memory taken on RAM 1.566 GB
5 seconds taken
Tensor transferred on GPU
Memory taken on GPU 1.49 GB
Memory taken on RAM 4.024 GB
17 seconds taken
EDIT: Moreover, the (additional) memory allocation on RAM is not additive. Consider this case, when I send a different tensor (g2 = torch.rand(10000,15000)
) to the GPU, then I get a different memory consumption on RAM (0.9 GB). But when I send both the tensors (g
and g2
) to GPU, then the memory consumption on RAM is in negative (-1.4 GB).