0

I am trying gluon model zoo.

import mxnet as mx
from mxnet.gluon.model_zoo import vision
import cv2
import numpy as np

ctx = mx.gpu(6) # successful
net = vision.alexnet(pretrained=True, ctx=ctx)

# preparing input image. 
# You may ignore this process. This just preprocess an image for the net.
# To load input image as shape (batch=1, channel=3, width, height)
im = cv2.imread(‘img.jpg’) # w,h = 4032,3024. rgb color image
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB).astype(float)/255
im = mx.image.color_normalize(im, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 
im = np.transpose(im, (2,0,1)) # (4032,3024,3) -> (3,4032,3024)
im = im[None,:] # (3,4032,3024) -> (1,3,4032,3024). this means batchsize=1
im = mx.nd.array(im, ctx=ctx)

# run 
r = net(im)

When I run this, an error occurs.

MXNetError: Shape inconsistent, Provided = [4096,9216], inferred shape=(4096,2976000)

Do I have to resize image to a specific size? At manual, gluon needs only minimum sizes of width, height. Should I have to consider maximum size, or fix the input size?

José Pereda
  • 44,311
  • 7
  • 104
  • 132
plhn
  • 5,017
  • 4
  • 47
  • 47

2 Answers2

1

You need to fix input size with 256 to 256 as this was the image size AlexNet network was trained, according to original paper. Usually, you achieve it by resizing smaller axis (width or height) to be 256 and then doing center crop.

The thing is that when you use neural networks to predict something, you need to prepare your input data in the exact same way as the training data. If you don't do that, in the simplest case the shape mismatch error will occur. In a more complicated case, when shapes matches, but the image is drastically different from what the model was trained in, the result would be most certainly wrong.

Sergei
  • 1,617
  • 15
  • 31
  • I resolved this problem with your kind advice. Thank you. (but Sergei, was alexnet’s original input size 256*256? I thought it was 224*224) – plhn Nov 09 '18 at 05:56
  • I think it is 256x256. Here is the quote from the paper: "...Therefore, we down-sampled the images to a fixed resolution of 256×256. Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and then cropped out the central 256×256 patch from the resulting image...". The paper is here https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf – Sergei Nov 09 '18 at 18:18
0

When I resize the input image to under 254*254, inference succeeded.

Maybe mxnet’s pretrained alexnet does not handle images of large size.

@Sergei’s comment helped. Thank you.

plhn
  • 5,017
  • 4
  • 47
  • 47