I am working on custom object detection with YOLOv5. We can provide different input image sizes to the network. How can a DNN network accept different sizes of input? Does YOLO has different backbones for different input sizes?
When I give the argument --imgsz as 640, YOLO dataloader is resizing it to (384, 672, 3) and if the --imgsz is 320, the resized images are of size (224, 352, 2). As conventional CNNs accepts fixed square-sized (equal height and width) inputs, How is YOLO handling the variable image sizes?