A deep learning model like Inception has the capacity to learn these features given enough data. You shouldn’t need to crop to features you think are useful for differentiating the two classes. Ideally one or more of the convolutional filters will detect certain shapes in snout, and be able to classify correctly.
You shouldn’t have unreasonably expectations for the model though. If the visual data alone isn’t enough for an expert to classify certain alligator vs crocodiles, you shouldn’t expect the model to be able to do much better. You should establish an human baseline performance, and use this for comparison.
As with all models, data quality/quantity is the most important part. I would strongly advise you too look into transfer learning too; using the weights that have been learnt on much larger datasets as a starting point. Check out this blog post for an example. You can train the fully connected layers at the end of the model to differentiate alligators from crocodiles. And even fine tune the convolutional layers for improved performance.
You can get started with transfer learning easily with MXNet Gluon. In the snippet below, we’re transferring the weights from an Inception v3 model that’s already been trained on ImageNet (with 1000 classes) to a very similar model for binary classification (identical apart from the last layers). You can then train this network with your own data.
import mxnet as mx
pretrained_net = mx.gluon.model_zoo.vision.get_model(name='inceptionv3', pretrained=True, classes=1000, prefix='aligcroc_')
net = mx.gluon.model_zoo.vision.get_model(name='inceptionv3', classes=2, prefix='aligcroc_')
net.features = pretrained_net.features
net.output.initialize()
batch_size = 1
channels = 3
height = width = 299
data_batch = mx.ndarray.random.normal(shape=(batch_size, channels, height, width))
net(data_batch)