I'm working on a project that trains an ML model to predict the location of Waldo in a Where's Wally? image using AWS Sagemaker with the underlying object detection algorithm being Single Shot Detection, but I am thinking that using an actual puzzle image with dimensions like 2000 x 2000 as training data is not possible and that SSD will auto-resize the image to 300 x 300 which would render Waldo a meaningless blur. Does SSD re-size images automatically, or will it train on the 2000 x 2000 image? Should I crop resize all puzzles to 300 x 300 images containing Waldo, or can I include a mix of actual puzzle images with dimensions 2000+ x 2000+ and the 300 x 300 cropped images?
I'm considering augmenting the data by cropping these larger images at locations that contain Wally so that I can have 300 x 300 images where Wally isn't reduced to a smudge on a page and is actually visible - is this a good idea? I am thinking that SSD does train on the 2000 x 2000 image, but the FPS will reduce by a lot - is this wrong? I feel like if I don't use the 2000 x 2000 image for training, in the prediction stage where I start feeding the model images with large dimensions (actual puzzle images), the model won't be able to predict locations accurately - is this not the case?