YOLOv7 not training on 10540 images dataset

Question

The main problem is that it says I don't have enough memory ... I don't know if this is a solvable issue.

Traceback (most recent call last):
  File "c:\Users\LinusFackler\Documents\GitHub\YOLOvCAPY\yolov7-custom\train.py", line 616, in <module>
    train(hyp, opt, device, tb_writer)
  File "c:\Users\LinusFackler\Documents\GitHub\YOLOvCAPY\yolov7-custom\train.py", line 338, in train
    imgs = imgs.to(device, non_blocking=True).float() / 255.0  # uint8 to float32, 0-255 to 0.0-1.0
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 786432000 bytes.

The train.py is called with the following command

!python train.py --img 640 --batch-size 160 --epochs 1 --data "data\custom.yaml" --cfg "cfg\training\yolov7-custom.yaml" --weights yolov7.pt --name yolov7-custom --device cpu --workers 5 --hyp "data\hyp.scratch.custom.yaml"

This is the cfg yaml file

nc: 12  # number of classes
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple

# anchors
anchors:
  - [12,16, 19,36, 40,28]  # P3/8
  - [36,75, 76,55, 72,146]  # P4/16
  - [142,110, 192,243, 459,401]  # P5/32
...

I did not share the whole file just the part where the changes were made

This is the data config yaml

train: C:\Users\LinusFackler\Documents\GitHub\YOLOvCAPY\train
test: C:\Users\LinusFackler\Documents\GitHub\YOLOvCAPY\test
val: C:\Users\LinusFackler\Documents\GitHub\YOLOvCAPY\val
#Classes
nc:  12 # replace classes count 
#classes names
#replace all class names list with your custom classes
names: ['person', 'car', 'bike', 'motor', 'bus', 'truck', 'light',
        'hydrant', 'sign', 'skateboard', 'scooter', 'other vehicle'

reduce batch size and workers man. from 160 -> 2 or 4 or 6, and workers from 5 to 2 or 3 — Ahmad Anis, Nov 21 '22 at 07:42
@AhmadAnis I will try that! Honestly should have thought of that lmao — Liferafter, Nov 21 '22 at 08:20
@AhmadAnis Did not work ... I decreased the dataset to 100 images too, it still throws the same error — Liferafter, Nov 21 '22 at 09:24
Looks like you're calling from a jupyter notebook. Why not try it on simple terminal after preparing your complete dataset. Jupyter can behave weirdly sometimes. — Ahmad Anis, Nov 21 '22 at 10:02
@AhmadAnis We got it to work with 1 worker, and 5 images per batch, but the training time for 1 epoch with 100 images came out to be 5 minutes. That scales to around 8 hours for 1 epoch of the full 10, 540 images, is there a way to increase the training time, without just increasing the computational resources offered by the computer? — Liferafter, Nov 22 '22 at 03:50
Maybe try another algorithm that can work better on CPU. Try with FasterRCNN, or Yolov5, use smaller image input size here(you're using 640, change to 224 or 128), normalize the input image(might need to do it manually). — Ahmad Anis, Nov 22 '22 at 05:23
You can also use openvino or tflite for speedup on yolov5(https://github.com/ultralytics/yolov5/issues/251) — Ahmad Anis, Nov 22 '22 at 06:02

YOLOv7 not training on 10540 images dataset

0 Answers0