0

I'm working on a project that trains an ML model to predict the location of Waldo in a Where's Wally? image using AWS Sagemaker with the underlying object detection algorithm being Single Shot Detection, but I am thinking that using an actual puzzle image with dimensions like 2000 x 2000 as training data is not possible and that SSD will auto-resize the image to 300 x 300 which would render Waldo a meaningless blur. Does SSD re-size images automatically, or will it train on the 2000 x 2000 image? Should I crop resize all puzzles to 300 x 300 images containing Waldo, or can I include a mix of actual puzzle images with dimensions 2000+ x 2000+ and the 300 x 300 cropped images?

I'm considering augmenting the data by cropping these larger images at locations that contain Wally so that I can have 300 x 300 images where Wally isn't reduced to a smudge on a page and is actually visible - is this a good idea? I am thinking that SSD does train on the 2000 x 2000 image, but the FPS will reduce by a lot - is this wrong? I feel like if I don't use the 2000 x 2000 image for training, in the prediction stage where I start feeding the model images with large dimensions (actual puzzle images), the model won't be able to predict locations accurately - is this not the case?

jonjitsu
  • 1
  • 1

1 Answers1

0

SageMaker object detection resizes the image based on the input parameter "image_shape", which you use a size larger than 300 x 300. But 2000 x 2000 might be too large for the algorithm and it will also slow down the training speed. You can try to use a image size somewhere in the middle. Cropping larger images into small patches is a good idea for solving this problem. For the inference, the input image will also be resized to the same size as the training parameter "image_shape". So you may want to crop or resize the large image before you send them to the endpoint.