I am trying to use this yolo7 model inside unity with Barracuda, to detect objects (the client at work requested this model specifically).
When running the examples under python, for each frame or image, it exports a line (when passing --save-txt flag) with box dimensions and a number, that I believe correspond to a class or a label of the output type (is it a horse or a car etc).
I exported the model as onnx model, using the code provided in the repo:
python export.py --weights yolov7-tiny.pt --grid --end2end --simplify \
--topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640
however, I noticed if I import this in unity, it gives warnings about Resize not supported, and end2end Flatten not supported. Both of which are documented here.
I removed --end2end from the export flags, which removed the warning, and for the Resize warning I followed the advice here
Opset 9 would mean Split will be version 2
so I switched this line inside export.py to use version 9 instead of 12
This removed all the warnings, however, the shape of the output of the barracuda model once imported into unity is now as follows: n:1 h:1 w:85 c:25200
So, I have the following questions are:
1- what does the 25200 means? is it the number of labels an output can have?
2- how do I proceed, to extract the boxes and labels, I was expecting the shape to include the number of detections in each image, the label name, the score, the bounding box.
however looking at the source code where I believe the processing is happening, is really messy and I could not figure out how it does this conversion, or the extraction to port it to csharp in unity.
Any hints will be appreciated.