I am trying to detect playing cards with yolov8, there are 52 classes, and at times, specially when the cards are in motion, I will predict the wrong card, for example, I could get a result such as:
boxes: tensor([[3.9575e+02, 4.4631e+01, 6.3413e+02, 2.2286e+02, 4.4027e-01, 2.3000e+01]], device='cuda:0')
cls: tensor([23.], device='cuda:0')
conf: tensor([0.4403], device='cuda:0')
data: tensor([[3.9575e+02, 4.4631e+01, 6.3413e+02, 2.2286e+02, 4.4027e-01, 2.3000e+01]], device='cuda:0')
id: None
is_track: False
orig_shape: (720, 1280)
shape: torch.Size([1, 6])
xywh: tensor([[514.9391, 133.7455, 238.3799, 178.2292]], device='cuda:0')
xywhn: tensor([[0.4023, 0.1858, 0.1862, 0.2475]], device='cuda:0')
xyxy: tensor([[395.7491, 44.6309, 634.1290, 222.8602]], device='cuda:0')
xyxyn: tensor([[0.3092, 0.0620, 0.4954, 0.3095]], device='cuda:0')
Which was inaccurate prediction (cls 23, confidence 0.4403)
Instead of basing predictions on a single frame I was thinking about adding the top 3 confidence predictions for each frame and see in the end what is the best overall prediction for an object adding all the frames it appears together.
So if say cls=23 has a confidence of 0.44, could I get an easy access to the lists of other classes that had confidence over a certain threshold for the same box?
while True:
success, img = cap.read()
results = model(img, stream=True)
for r in results:
boxes = r.boxes
for box in boxes:
#Boundig box
x1, y1, x2, y2 = box.xyxy[0]
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
w, h = x2-x1, y2-y1
#cv2.rectangle(img, (x1,y1), (x2,y2), (0,200,0), 3)
cvzone.cornerRect(img, (x1,y1, w, h))
#Confidence
conf = math.ceil(box.conf[0]*100)/100
#Class name
cls = int(box.cls[0])
cvzone.putTextRect(img,f'{conf} {classNames[cls]}',(max(0,x1), max(35,y1-20)), scale=0.7, thickness=1)