1

I am trying to calculate the memory footprint of my fine-tuned model during the inference time. I wish to calculate how much RAM memory the model will need on a system with no GPU? and how much GPU memory this model needs on a system with GPU?

While calculating this, I observed that when I transfer my fine-tuned (pytorch) model from CPU to GPU, some additional memory on RAM is allocated for this. I am not able to understand why that happens. This answer is not comprehensive enough.

To replicate the problem use this code:

import time
import torch
import psutil

def stathere():
    av = []
    if torch.cuda.is_available():
        av.append(torch.cuda.memory_allocated(torch.device("cuda"))/(1024*1024*1024))
    else:
        av.append(0)
    av.append(psutil.virtual_memory().available/(1024*1024*1024))
    a = time.time()
    return av, a

def statnow(av, a):
    if torch.cuda.is_available():
        print("Memory taken on GPU", round(torch.cuda.memory_allocated(torch.device("cuda"))/(1024*1024*1024)-av[0],3), "GB")
    print("Memory taken on RAM", round(av[1]-(psutil.virtual_memory().available/(1024*1024*1024)),3),"GB")
    print(round(time.time()-a), "seconds taken")
    return

av, a = stathere()
print('Tensor on RAM')
g = torch.rand(20000,20000)
statnow(av,a)

del g

av, a = stathere()
print('Tensor transfered on GPU')
g = torch.rand(20000,20000).to(torch.device("cuda:0"))
statnow(av,a)

Output

Tensor on RAM
Memory taken on GPU 0.0 GB
Memory taken on RAM 1.566 GB
5 seconds taken
Tensor transferred on GPU
Memory taken on GPU 1.49 GB
Memory taken on RAM 4.024 GB
17 seconds taken

EDIT: Moreover, the (additional) memory allocation on RAM is not additive. Consider this case, when I send a different tensor (g2 = torch.rand(10000,15000)) to the GPU, then I get a different memory consumption on RAM (0.9 GB). But when I send both the tensors (g and g2) to GPU, then the memory consumption on RAM is in negative (-1.4 GB).

Ritwik
  • 521
  • 7
  • 17

0 Answers0