I am trying to decode the output of YOLOv3-tiny when made to infer using Intel's OpenVINO toolkit. I am following their demo code, which obtains 2 output blobs. One of them is of dimension: (1,255,13,13). This is not like the v2 output which was like (1,13,13,425).
I understand that the 255 in v3 and 425 in v2 are due to the different number of anchors in both, but the problem is:
The v2 result could be decoded without flattening this blob, whereas in v3, the demo shows that they flatten it and then use mysterious magical methods to extract the box coordinates and other parameters.
I can't understand how the arrangement of matrix/array could change the way one approached the problem ? I mean how do they decide whether it is possible to extract the desired results using nested for loops to go deeper into the array or flattening it and then traversing the values.