High-resolution image classification

Question

Usually pre-trained networks like VGG16 / Inception etc. works with low resolution like < 500px.

Is it possible to add a high-resolution convolution layer (or two) before the very first layer of pre-trained VGG16 / Inception to make the network be able to consume high-resolution pictures?

As far as I know - the first layers are hardest to train, it took a lot of data and resources to train it.

I wonder if it will be possible to freeze pre-trained network and train only the newly attached high-resolution layer on an average GPU card and about 3000 examples? Could it be done in a couple of hours?

Also if you know any examples how to use high-resolution images for image classification please share the link.

P.S.

The problem with usual downscaling approach is that in our case the tiny details like tiny cracks or tiny dirt dots are very important and they are lost on lower-resolution images.

what is the size of your input images ? – Harsha Pokkalla Jun 22 '17 at 22:24 — Harsha Pokkalla, Jun 22 '17 at 22:24

score 7 · Answer 1 · answered Jun 26 '17 at 16:05

It's unlikely you'll be able to freeze a pretrained network and then just add extra layers at the start unfortunately, since the initial layers require three channel inputs and are designed to spot features in the image.

Instead, you could try modifying the architecture of the network so that the initial layer does take in the 1024x1024 images, before downscaling using pooling or striding.

For example, you could try adjusting the stride for the first conv layer in the Slim model definition of Inception V3 to be 8 instead of 2: https://github.com/tensorflow/models/blob/master/slim/nets/inception_v3.py

That would allow you to read in 4x larger images, while keeping the rest of the network the same. I expect you'll need to do a full retraining though unfortunately.

score 3 · Answer 2 · answered Jul 03 '17 at 17:55

To apply trained ImageNet models to your problem, you might consider placing new layers on the back end, rather than the front. The first layer in ImageNet CNN models are trained to detect "universal" low-level features (horizontal/vertical/diagonal edges, color blobs, etc.) Those convolutional layers don't require a fixed image size (images are typically resized to common dimensions because of the expected scale of objects in the image and/or to accommodate dimensions of the fully-connected layers).

So you could try keeping the first N convolutional layers as-is, then adding custom layers to be trained for your data set. After training the custom layers, you can decide whether it makes sense to unfreeze the existing ImageNet layers and update them as well.

High-resolution image classification

2 Answers2