Neural Network Underfitting with Dogs and Cats

Question

Without necessarily getting into the code of it, but focusing more on the principles, I have a question about what I assume would be underfitting.

If I am training a network that recognizes true or false as to whether an image is of a dog, and I have maybe 40,000 images, where all dog images are labeled as 1, and all other images are labeled as 0 - what can I do to assure accuracy so that, if only maybe 5,000 of those images are dogs, the network does not act “lazily” from its training, and also label dogs as closer to 0 than 1?

For example, the main purpose of this question is to be able to recognize with high accuracy if an image really is of a dog, without really caring too much about the other images, other than the fact that they are not of dogs. Also, I would like to be able to retain the probability that the guess is correct, because this is highly important for my purposes.

The only two things I was able to come up with were to:

Have more nodes in the network, or
Have half of the images be of dogs (so use 10,000 images where 5,000 of them are dogs).

But I think this 2nd option might give dogs a disproportionately large chance of being the output of the testing data, which would destroy the accuracy and the whole purpose of this network.

I am sure this has been addressed before, so even a point in the right direction would be highly appreciated!

score 0 · Accepted Answer · answered Jun 29 '18 at 20:05

0

So you have a binary classification task where both classes appear with different frequency in your dataset. About 1/8 is "dog" and 7/8 is "no dog".

In order to avoid biased learning towards one or the other class, it is important that you stratify your training, validation and test data so that these fractions are kept across every subset.
You say that you want to "retain the probability" that the guess is correct - I assume you mean you want to evaluate the "dogness"-probability as output variable. That's a simple softmax output layer with two outputs: 1st is "dog", 2nd "not dog". It's the typical way to address classification problems, regardless of the number of classes you need to distinguish.

answered Jun 29 '18 at 20:05

ascripter

5,665
12
45
68

Thanks! Could it also be a problem with underfitting if it is not good at recognizing dogs, and spits out an answer closer to 0 than 1, for every time it is tested on dogs? – Jmeeks29ig Jun 30 '18 at 20:13
If you keep having a problem fitting your model, I'd rather say it might either be (1) a problem with image data preparation, i.e. do your [standardize](https://en.wikipedia.org/wiki/Feature_scaling) your data well, or (2) a problem of your overall architecture - in this case I'd say stick with proven models from the literature instead of creating your own (if you're not already doing this anyway) – ascripter Jul 01 '18 at 08:47
Thanks! I appreciate it. – Jmeeks29ig Jul 07 '18 at 17:37

Neural Network Underfitting with Dogs and Cats

1 Answers1