YOLOV5 Despite having enough vram, I get RuntimeError: CUDA error: out of memory error even though there is not enough vram

Question

I am traning YOLOV5 "L6" model for my important project. I have a so huge dataset contains UAV and drone image, and I need the train with huge input dimension (A few months ago I train "M" model with 640x640 input dimension with RTX 3060) in the model there are several bad performance the some categories detection is really god (Vehice and landing are etc.) but when the job came small objects like human model is stuck and confuse.So I decide to train 1280x1280 input size and one months ago I bought RTX 3090 TI. I am run my code in WSL 2 and its fully configured for DL/ML.

The point is when I run any YOLOV5 model with higher then 640x640 size I am getting below error. In the below example I ran with "M6" model with 8 batch size and 1280x1280 input size and the vram usage is around 12 GB so its not exclusive higher model. Also its look like not generally out of memory error because I tried "L6" model with 16 batch size and 1280x1280 input size. I get vram usage bigger then 24 GB vram usage it was instantly crash with cuda out of memory error and like always it was showing allocatin error.

  File "/mnt/d/Ubuntu-WSL-Workspace/Code_Space/Code Workspace/Python Projects/AI Workspace/Teknofest-AI-in-T/2023YOLOV5/Last-YOLOV5/yolov5/train.py", line 640, in <module>
    main(opt)
  File "/mnt/d/Ubuntu-WSL-Workspace/Code_Space/Code Workspace/Python Projects/AI Workspace/Teknofest-AI-in-T/2023YOLOV5/Last-YOLOV5/yolov5/train.py", line 529, in main
    train(opt.hyp, opt, device, callbacks)
  File "/mnt/d/Ubuntu-WSL-Workspace/Code_Space/Code Workspace/Python Projects/AI Workspace/Teknofest-AI-in-T/2023YOLOV5/Last-YOLOV5/yolov5/train.py", line 352, in train
    results, maps, _ = validate.run(data_dict,
  File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/d/Ubuntu-WSL-Workspace/Code_Space/Code Workspace/Python Projects/AI Workspace/Teknofest-AI-in-T/2023YOLOV5/Last-YOLOV5/yolov5/val.py", line 198, in run
    for batch_i, (im, targets, paths, shapes) in enumerate(pbar):
  File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/mnt/d/Ubuntu-WSL-Workspace/Code_Space/Code Workspace/Python Projects/AI Workspace/Teknofest-AI-in-T/2023YOLOV5/Last-YOLOV5/yolov5/utils/dataloaders.py", line 172, in __iter__
    yield next(self.iterator)
  File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 634, in __next__
    data = self._next_data()
  File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
    return self._process_data(data)
  File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
    raise exception
RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
  File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 34, in do_one_step
    data = pin_memory(data, device)
  File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 67, in pin_memory
    return [pin_memory(sample, device) for sample in data]  # Backwards compatibility.
  File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 67, in <listcomp>
    return [pin_memory(sample, device) for sample in data]  # Backwards compatibility.
  File "/home/yigit-ai-dev/.pyenv/versions/3.10.9/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 55, in pin_memory
    return data.pin_memory(device)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.```

And yes in the background no any program working on GPU, just Pytorch uses for traning. — Tony Stark, Apr 18 '23 at 06:50

score 1 · Accepted Answer · answered Apr 19 '23 at 05:44

It may be related to WSL2. Both not letting you use most of your system's RAM and also constraining the memory available for one application, as that is one of the known limitations of WSL.

References to the Nvidia WSL guide regarding its limitations, etc: https://docs.nvidia.com/cuda/wsl-user-guide/index.html

"Pinned system memory (example: System memory that an application makes resident for GPU accesses) availability for applications is limited."

"For example, some deep learning training workloads, depending on the framework, model and dataset size used, can exceed this limit and may not work."

Regarding how to fix this problem. The following thread provides some advice on it:

https://github.com/huggingface/diffusers/issues/807

Setting a higher limit for your system's RAM on WSL and updating the distribution may help getting a higher use rate of your hardware resources.

Modify the .wslconfig file to set a higher amount of system memory and also call wsl --update to update your Linux distribution within Windows.

Thanks for the reply. I saw your answer after I solved my problem. What you said is absolutely true I was using physical Arch Linux beforehand. I wanted to test if the problem was in WSL and I installed Arch Linux as a dual boot and tried again and it worked much better than Windows Vram usage was more, while Arch was less about 3-4 GB (Because Windows' resource usage is a little more than Linux). I completed the training without problems 1280 in 1280 resolution. — Tony Stark, Apr 20 '23 at 05:17

score 0 · Answer 2 · answered Apr 18 '23 at 07:00

0

I can think of few ways to be able to start your training.

Decrease your batch size
You say you have a rtx3060 and a rtx3090. If your computer can handle both, then use them both during training.
Change your data to fp16 precision.
Crop your images and then train on the cropped data. That shouldn't change that much the accuracy.

answered Apr 18 '23 at 07:00

Toyo

667
1
5
22

So I cant crop because its UAV image and I am in the competition, the image contains so many object. Then I think FP16 precision will be effect accuracy badly. Lastly I allready decrease batch size to 8 with "M6" model, the total memory usage in the system 13 GB. I mean 11 GB is free why am I getting like this error, this is nonsense. In soon I will hardly work with Pytorch framework deeply I think that time I will understand but I have build last model for competition (1 week remain). – Tony Stark Apr 18 '23 at 07:15
I don't understand why you can't crop as you are using a computer right ? Crop the image with OpenCV or this kind of library and input those cropped data through your model. Your crop, input, get the result and merge. Getting this error is basically usual, 1280x1280 with a batch size of 8 ? Yes, it will use a lot of memory even with yolov5 L6. – Toyo Apr 18 '23 at 07:22
I cant crop because my dataset mostly consist of 1920x1080 images like I said its UAV images so this mean there objects almost everywhere in the whole frame, if I crop then I will lose many object. I can write script that part is easy but I cannot do that, in this case I can just code resizing script. – Tony Stark Apr 18 '23 at 07:32
And why I cant train with 24 GB vram even using 13 GB Vram – Tony Stark Apr 18 '23 at 07:33
I mean your suggest are typical I know that information, I allready train lower model with less input size. If I enough vram why I reducing parameters this mean I bought 3090 TI for nonsense. Technically, it doesn't feel right at all. In addition, it happens in all resolutions except 640, when we rate the padding size with sride, this size in YOLO is 16, so even if my input size is 656, I get the same error even though it uses low vram, when I increase or decrease batch size, it does not change whether I get an error or not. – Tony Stark Apr 18 '23 at 07:39

score 0 · Answer 3 · answered Apr 20 '23 at 05:20

0

I solved the problem by returning to using physical Linux. In my case problem is I think WSL because the computer must reserve resources for both Linux and Windows. For this reason, the computing power is more limited.

answered Apr 20 '23 at 05:20

Tony Stark

25
3

YOLOV5 Despite having enough vram, I get RuntimeError: CUDA error: out of memory error even though there is not enough vram

3 Answers3