1

There is a general question on this at Image classification and image resizing

But the present question is specifically about Azure Cognitive Services Custom Vision prediction using a Python/Tensorflow exported model.

Following https://learn.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/export-model-python, which details helper functions, the recommended operations for preparing an arbitrarily-shaped image for local/exported model prediction are:

    # Update orientation based on EXIF tags, if the file has orientation info.
    image = update_orientation(image)

    # Convert to OpenCV format
    image = convert_to_opencv(image)

    # If the image has either w or h greater than 1600 we resize it down respecting
    # aspect ratio such that the largest dimension is 1600
    image = resize_down_to_1600_max_dim(image)

    # We next get the largest center square
    h, w = image.shape[:2]
    min_dim = min(w,h)
    max_square_image = crop_center(image, min_dim, min_dim)

    # Resize that square down to 256x256(*)
    augmented_image = resize_to_256_square(max_square_image)

[(*) Note that for currently exported models this should actually be 224*224 or 227*227.... - YOLO under the hood?]

These steps are problematic not least because they cut out the centre of an image and throw away the (in my case valuable) information from the edges. (The eventual resolution is also terrible but I assume this to be unavoidable.)

Prediction: When an image is sent directly to an online Custom Vision prediction endpoint, it does not - judging by the API-accessed resized image - appear to be cropped. How is this achieved - is the image just distorted to the necessary square?

Training: Is the 'resized image' the one used for training, or not? Or have all the images used for training also been cropped? If so should I resize these to square before uploading? If not, how is this feat achieved?

Thanks

(Posting here as recommended by the Azure team.)

jtlz2
  • 7,700
  • 9
  • 64
  • 114
  • 1
    On the doc you mentioned, they say that "These steps mimic the image manipulation performed during training"... I agree with you, it's quite strange! I did not know about something similar on the API side – Nicolas R Jun 28 '19 at 17:16
  • @NicolasR Good spot - but alarming if true.. Means all the peripheral information is lost for an A4 document, to the extent that it's a miracle it works at all.. I tried various strategies such as padding on the prediction side but they don't do as well as the pipeline given above - and we have no control over their server-side embedded training pipeline... What to do.. All I can think of is to pad to square before training, or carve up the document first. I hope someone from Azure will chime in... As for the resolution... – jtlz2 Jun 29 '19 at 14:11

0 Answers0