Custom trained YOLOv7 detect object frame by frame in a video

Question

I want to process a video frame by frame and for each frame to detect if a certain object has appeared using an already custom trained YOLOv7 model. I saw that this is pretty simple for a image or a set of images because after you load the mode you can use the path to the image and after that you can extract the important features like the upper left corner, bottom right corner, confidence and the class like in the following code:

model=torch.hub.load('WongKinYiu/yolov7','custom','best.pt',force_reload=True)
res=model(img_path)
df=res.pandas().xyxy[0]

But I don't know how to use this model for a image that was read in a variable using opencv module:

import cv2

cap=cv2.VideoCapture('iohannis1.mp4')
while cap.isOpened():
    
    ret,frame=cap.read()
    if not ret:
        break
        
    cv2.imshow('Frame',frame)
    if cv2.waitKey(1) & 0xFF==ord('q'):
        break
           
cap.release()
cv2.destroyAllWindows()

In this case I want to apply the model for frame and to obtain a list with the important data like upper left corner, bottom right corner , confidence and the class. An extremely dummy option would be to save each frame and then to give the path to that frame to the model but I think it will be extremely time consuming. And also I don't want to use the detect script from yolov7 because that option is extremely constraining it parses the video, make the square around the region of interest and saves the video. I want to make further processing with the data obtained from each frame.

Can I do this task using torch or should I convert the model to other type and load that using another framework like tensorflow or onnx?

I tried to convert the model and use it but I didn't succeed in using it. I tried to use the model for a loaded image and for a path to an image and I noticed that there were 2 different results. For the path it was a good result and for the loaded image it was a bad result.

import torch
import cv2

model=torch.hub.load('WongKinYiu/yolov7','custom','best.pt')
source_path="img_iohannis1.jpg"
img=cv2.imread(source_path)

res1=model(source_path)
res2=model(img)

df1=res1.pandas().xyxy[0]
df2=res2.pandas().xyxy[0]

print("Result 1 from image path is")
print(df1)
print("Result 2 from loaded image is")
print(df2)

you haven't tried `res = model(frame)` or `res = model([frame])`, have you? — Christoph Rackwitz, Apr 23 '23 at 18:40
I tried this for an image and It is a little problem with this approch because it gives different results for the same image. When I use the path to the image it gives good results but when I use the loaded image it gives extremely bad results (most of the time it detects nothing). — Andrei, Apr 24 '23 at 09:19
without knowing much about pytorch or torchhub, perhaps the thing is bothered by OpenCV's BGR order of color channels, or maybe it wants the input transposed (from HWC to NCHW/NHWC) — Christoph Rackwitz, Apr 24 '23 at 21:53
@ChristophRackwitz This sounds right actually, images in pytorch should be formatted like ```batch_size x rgb_channels x height x width```. https://discuss.pytorch.org/t/dimensions-of-an-input-image/19439 — Brock Brown, Apr 25 '23 at 20:36

Custom trained YOLOv7 detect object frame by frame in a video

0 Answers0