How to make an image recognition deep net detect small differences

Question

I am currently working on a machine learning project with Apache MXNet, and I am using the Inception V3 model (imagenet1k-inception-bn model on the MXNet model zoo).

I am currently trying to train a model to distinguish between two object types, but the difference between the objects are subtle. I am finding that the model still confuses one thing for the other since it looks mostly the same.

For example, say you are trying to train a model to distinguish between an alligator and a crocodile. One of the ways humans tell them apart at first glance is by looking at the shape of their snout. When training a machine learning model, would I give it images of entire alligators and crocodiles and hope it figures it out, or would I give it images of just their snouts since that is the difference I am focusing on?

Thanks!

score 2 · Answer 1 · answered Feb 21 '18 at 19:09

A deep learning model like Inception has the capacity to learn these features given enough data. You shouldn’t need to crop to features you think are useful for differentiating the two classes. Ideally one or more of the convolutional filters will detect certain shapes in snout, and be able to classify correctly.

You shouldn’t have unreasonably expectations for the model though. If the visual data alone isn’t enough for an expert to classify certain alligator vs crocodiles, you shouldn’t expect the model to be able to do much better. You should establish an human baseline performance, and use this for comparison.

As with all models, data quality/quantity is the most important part. I would strongly advise you too look into transfer learning too; using the weights that have been learnt on much larger datasets as a starting point. Check out this blog post for an example. You can train the fully connected layers at the end of the model to differentiate alligators from crocodiles. And even fine tune the convolutional layers for improved performance.

You can get started with transfer learning easily with MXNet Gluon. In the snippet below, we’re transferring the weights from an Inception v3 model that’s already been trained on ImageNet (with 1000 classes) to a very similar model for binary classification (identical apart from the last layers). You can then train this network with your own data.

import mxnet as mx

pretrained_net = mx.gluon.model_zoo.vision.get_model(name='inceptionv3', pretrained=True, classes=1000, prefix='aligcroc_')
net = mx.gluon.model_zoo.vision.get_model(name='inceptionv3', classes=2, prefix='aligcroc_')
net.features = pretrained_net.features
net.output.initialize()

batch_size = 1
channels = 3
height = width = 299
data_batch = mx.ndarray.random.normal(shape=(batch_size, channels, height, width))
net(data_batch)

How to make an image recognition deep net detect small differences

1 Answers1