8

I downloaded a a prepared dataset for YoloV7. Also I cloned yoloV7 Repo.

I want to train a model with this downloaded dataset, for this I use this command.

python train.py --workers 8 --device 0 --batch-size 16 --data data.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights yolov7x.pt --name yolov7 --hyp data/hyp.scratch.p5.yaml

I got this RuntimeError

autoanchor: Analyzing anchors... anchors/target = 5.50, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 test
Using 8 dataloader workers
Logging results to runs\train\yolov74
Starting training for 300 epochs...

     Epoch   gpu_mem       box       obj       cls     total    labels  img_size
  0%|                                                                                                                                                                                                               | 0/372 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "D:\projects\yolov7\train.py", line 618, in <module>
    train(hyp, opt, device, tb_writer)
  File "D:\projects\yolov7\train.py", line 363, in train
    loss, loss_items = compute_loss_ota(pred, targets.to(device), imgs)  # loss scaled by batch_size
  File "D:\projects\yolov7\utils\loss.py", line 585, in __call__
    bs, as_, gjs, gis, targets, anchors = self.build_targets(p, targets, imgs)
  File "D:\projects\yolov7\utils\loss.py", line 759, in build_targets
    from_which_layer = from_which_layer[fg_mask_inboxes]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

My System contains 1xCpu, 1x Cuda GPU (its a default gaming pc)

Tristate
  • 1,498
  • 2
  • 18
  • 38

4 Answers4

27

I believe it's a bug in the current implementation. You can fix it by changing utils/loss.py line 685 to

from_which_layer.append((torch.ones(size=(len(b),)) * i).to('cuda'))

and also add a line after 756 to put fg_mask_inboxes on your cuda device:

fg_mask_inboxes = fg_mask_inboxes.to(torch.device('cuda'))
anactualtoaster
  • 396
  • 3
  • 4
  • 2
    it worked!! after searching the web for two days.... – Oscar Rangel Nov 11 '22 at 16:33
  • It looks like whe you are to train with p6 models, you need to use train_aux.py but it gives you similar error, do you know how to fixt it ? ------ yolov7/utils/loss.py", line 1559, in build_targets2 from_which_layer = from_which_layer[fg_mask_inboxes] RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) – Oscar Rangel Nov 11 '22 at 19:51
  • @OscarRangel Try: from_which_layer.append((torch.ones(size=(len(b),)) * i).to('cuda')) in line 658 – Perry45 Dec 14 '22 at 17:02
  • Thanks. It worked, what I do not understand is that it was working before without this modification – Walid Ahmed Dec 16 '22 at 04:45
  • I also hat to change on line 742 `"cpu"` to `'cuda'` in `utils/loss.py` – Cubimon Dec 31 '22 at 21:37
2

if you are going to use the p6 models with yolov7 you need to use the train_aux.py, not the train.py and you need to change a couple of lines too:

1336 -- from_which_layer.append((torch.ones(size=(len(b),)) * i).to('cuda'))

1407 -- fg_mask_inboxes = fg_mask_inboxes.to(torch.device('cuda'))

Oscar Rangel
  • 848
  • 1
  • 10
  • 18
2

You can fix it by modifying the utils/loss.py. Replace line 759 of utils/loss.py i.e

from_which_layer = from_which_layer[fg_mask_inboxes]

by

from_which_layer = from_which_layer.to(fg_mask_inboxes.device)[fg_mask_inboxes]

The main idea behind this modification is to put both variables from_which_layer and fg_mask_inboxes in the same device.

npn
  • 304
  • 1
  • 14
  • This is the correct way to do it, since it doesn't involve introducing a hard-coded term (i.e., "cuda") into your code. – Hamster Hooey Jun 26 '23 at 20:24
1

https://github.com/WongKinYiu/yolov7/blob/main/utils/loss.py

Try changing the line 742 to

matching_matrix = torch.zeros_like(cost, device="cpu")

It worked for me.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 17 '22 at 20:08