0

I noticed a memory leak in torch, but couldn't solve it, so I decided to try and force clear video card memory with numba.

I've tried different memory cleanup options with numba, such as: from numba import cuda

1.

cuda.select_device(0)
cuda.close()
cuda.select_device(0)
for_cleaning = cuda.get_current_device()
for_cleaning.reset()
cuda.select_device(0)
cuda.close()

But there are constant errors when trying to load a model into the gpu after clearing the video memory

To reproduce the error, try the following code


from torchvision import models
from numba import cuda

model = models.densenet121(pretrained=True)
model.to(device)
# Then any of the suggested codes to clear the GPU memory
for_cleaing = cuda.get_current_device()
for_cleaing.reset()
# Trying to send to GPU new model
model = models.inception_v3(pretrained=True)
model.to(device)

Every time I got the same error:

File "C:\\ProgramData\\Anaconda3\\envs\\torch_diploma\\lib\\site-packages\\torch\\nn\\modules\\module.py", line 602, in \_apply
param_applied = fn(param)
File "C:\\ProgramData\\Anaconda3\\envs\\torch_diploma\\lib\\site-packages\\torch\\nn\\modules\\module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid argument

How to clear gpu memory and reuse gpu without errors?

PS. That didn't help me either

gc.collect()  # collecting garbage
torch.cuda.empty_cache()  # cleaning GPU cache
KlayMen TV
  • 11
  • 4

1 Answers1

0

I was facing the exact same problem and was able to solve it.

At first you need to delete the PyTorch model object if it is not needed anymore

del model

Also, you need to make sure that no other variable in your code is referencing the PyTorch model.
After this you can free the VRAM that was allocated by the deleted model

torch.cuda.empty_cache()

I think your problem is that the garbage collector does only free memory of objects which are not referenced anymore. In your code example I don't see a line where you unset your model variable.
So, for gc.collect() to work you would need to make sure that no variable is referencing the PyTorch model anymore:

model = None # remove reference to the pt model object

# Also make sure that no other variable is referencing the pt model.
# If there is no reference to the model anymore the garbage collector
# will delete it from memory.

gc.collect() # force deletion of the pt model object
torch.cuda.empty_cache() # free the VRAM that was occupied by the pt model object

I executed nvidia-smi before and after the torch.cuda.empty_cache() line to verify that the VRAM was freed.

Edit:
I also want to mention that I also tried

cuda.select_device(0)
cuda.close()

before and it did not work for me either. I think this is because it breaks the GPU memory management of PyTorch. (BTW the same thing happens with Tensorflow)

RapidEdgeAI
  • 11
  • 1
  • 4