Difference between (1,255,13,13) and (1,13,13,255) in context of YOLO output

Asked May 07 '20 at 08:22

Active May 07 '20 at 08:22

Viewed 65 times

I am trying to decode the output of YOLOv3-tiny when made to infer using Intel's OpenVINO toolkit. I am following their demo code, which obtains 2 output blobs. One of them is of dimension: (1,255,13,13). This is not like the v2 output which was like (1,13,13,425).

I understand that the 255 in v3 and 425 in v2 are due to the different number of anchors in both, but the problem is:

The v2 result could be decoded without flattening this blob, whereas in v3, the demo shows that they flatten it and then use mysterious magical methods to extract the box coordinates and other parameters.

I can't understand how the arrangement of matrix/array could change the way one approached the problem ? I mean how do they decide whether it is possible to extract the desired results using nested for loops to go deeper into the array or flattening it and then traversing the values.

asked May 07 '20 at 08:22

Pe Dro

2,651
3
24
44

You can just transpose. – Zabir Al Nazi May 07 '20 at 08:24
@ZabirAlNazi Yeah you are right about the conversion but I wanted to know why was the matrix flattened in reference to the YOLO v3-tiny output .. – Pe Dro May 07 '20 at 08:44

Difference between (1,255,13,13) and (1,13,13,255) in context of YOLO output

0 Answers0