10

I am new to machine learning and am currently trying to train a convolutional neural net with 3 convolutional layers and 1 fully connected layer. I am using a dropout probability of 25% and a learning rate of 0.0001. I have 6000 150x200 training images and 13 output classes. I am using tensorflow. I am noticing a trend where my loss steadily decreases, but my accuracy increases only slightly and then drops back down again. My training images are the blue lines and my validation images are the orange lines. The x axis is steps. enter image description here

I am wondering if there is a something I am not understanding or what could be possible causes of this phenomenon? From the material I have read, I assumed low loss meant high accuracy. Here is my loss function.

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
Sam K
  • 319
  • 1
  • 6
  • 13
  • 1
    Ever heard of *overfitting*? – sascha Aug 02 '16 at 19:30
  • 1
    Low training loss should mean low training set error. How low is your loss? Your scale is on millions, it's not clear your training loss is low (less than 1) from the graph – Yaroslav Bulatov Aug 02 '16 at 19:52
  • 1
    Yes I have heard of over fitting but I was under the assumption that if you are over fitting you would still have high accuracy in your training data. Sorry about the scale, my loss was between 1-10 when I finished training. – Sam K Aug 02 '16 at 20:13
  • 1
    Accuracy is known as "0-1" loss, whereas people typically minimize cross-entropy loss. Those losses are connected -- 0 cross-entropy loss implies 100% accuracy, and there are some bounds on accuracy from cross-entropy so low cross-entropy implies high accuracy. Most typically your kind of scenario represents a bug in loss function – Yaroslav Bulatov Aug 02 '16 at 20:50
  • What loss function are you using? – Dr. Snoopy Aug 03 '16 at 01:19
  • 1
    This is my loss function. cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y)). Where pred is the prediction array and y is the array containing the correct labels. The arrays are 128x13 since I have batches of size 128 and 13 classes. – Sam K Aug 03 '16 at 01:44
  • I am seeing a similar issue with Keras and Tensorflow, where training loss goes to near-zero while categorical accuracy is stuck nowhere near 100%. I'm curious if you might also be using a custom weight function? I am monitoring multiple statistics to try and track this down. I am seeing a reasonable accuracy statistic > 60% but top_k_categorical_accuracy shows zero with k=3. – mikeTronix Feb 10 '18 at 17:06

2 Answers2

8

That is because Loss and Accuracy are two totally different things (well at least logically)!

Consider an example where you have defined loss as:

loss = (1-accuracy)

In this case when you try to minimize loss, accuracy increases automatically.

Now consider another example where you define loss as:

loss = average(prediction_probabilities)

Though it does not make any sense, it technically is still a valid loss function and your weights are still tuned in order to minimize such loss.

But as you can see, in this case, there is no relation between loss and accuracy so you cannot expect both to increase/decrease at the same time.

Note: Loss will always be minimized (thus your loss decreases after each iteration)!

PS: Please update your question with the loss function you are trying to minimize.

exAres
  • 4,806
  • 16
  • 53
  • 95
  • 2
    The OP has commented that they are using multiclass logloss on a softmax output. – Neil Slater Aug 03 '16 at 07:26
  • 1
    @Sangram Hey! I was wondering, if `loss = average(prediction_probabilities)` is minimized, that means my `prediction_probabilities` are getting closer to the ground truth right, doesn't that make my accuracy better? – deeplearning Nov 08 '17 at 20:15
  • 1
    Not really! If you try to minimize **loss=average(prediction_probabilities)**, weights will be tuned in such a way that the network output will tend to be zero and this has nothing to do with the accuracy. If network outputs exactly zero prediction probability for a particular class (say for positive class), the accuracy is just the prevalence of that class. – exAres Nov 09 '17 at 10:34
1

softmax_cross_entropy_with_logits() and the accuracy are two different concepts with different formula definitions. Under normal cases, we could expect to get higher accuracy by minimizing softmax cross entropy, but they are calculated in different ways, so we couldn't expect them to be always increased or decreased in a synchronized way.

We use softmax cross entropy in CNN because it's effective for neural network training. If we use the loss = (1-accuracy) as loss function, it's very difficult to get better result through adjusting weights for our CNN neural network with our current mature backprogation training solutions, I really did it and confirmed this conclusion, you also could try it by yourself. Maybe it's caused by our current poor backprogation training solution, maybe it's caused by our neurons' definition(we need change it to some other types neuron?), but anyway, currently, using the accuracy in the loss function is not an effective way for neuron network training, so just use softmax_cross_entropy_with_logits() as those AI scientists told us, they already confirmed this way is effective, for others ways, we don't know them yet.

Clock ZHONG
  • 875
  • 9
  • 23
  • I'm currently learning a new online training and got a new understanding of why we need use softmax_cross_entropy_with_logits(). The reason is very simple: the softmax cross entropy function is a convex function, but most other functions are not. So we could find the global minimum value through finding local minimum value in a convex function. But for a non-convex function, e.g. loss = (1-accuracy), it has multiple local minimum values, so it's impossible to find suitable W&b values with our backpropagation algorithms on them. – Clock ZHONG Oct 03 '17 at 15:26