2

I have a classification scenario with more than 10 classes where one class is a dedicated "garbage" class. With a CNN I currently reach accuracies around 96%, which is good enough for me.

In this particular application false positives (recognizing "garbage" as any non-garbage class) are a lot worse than confusions between the non-garbage classes or false negatives (recognizing any non-garbage class instead of "garbage"). To reduce these false positives I am looking for a suitable loss function.

My first idea was to use the categorical crossentropy and add a penalty value whenever a false positive is detected: (pseudocode)

loss = categorical_crossentropy(y_true, y_pred) + weight * penalty
penalty = 1 if (y_true == "garbage" and y_pred != "garbage") else 0

My Keras implementation is:

def penalized_cross_entropy(y_true, y_pred, garbage_id=0, weight=1.0):
    ref_is_garbage = K.equal(K.argmax(y_true), garbage_id)
    hyp_not_garbage = K.not_equal(K.argmax(y_pred), garbage_id)
    penalty_ind = K.all(K.stack([ref_is_garbage, hyp_not_garbage], axis=0), axis=0) # logical and
    penalty = K.cast(penalty_ind, dtype='float32')
    return K.categorical_crossentropy(y_true, y_pred) + weight * penalty

I tried different values for weight but I was not able to reduce the false positives. For small values the penalty has no effect at all (as expected) and for very large values (e.g. weight = 50) the network only ever recognizes a single class.

  • Is my approach complete nonsense or should that in theory work? (Its my first time working with a non-standard loss function).

  • Are there other/better ways to penalize such false positive errors? Sadly, most articles focus on binary classification and I could not find much for a multiclass case.

Edit:

As stated in the comments the penalty from above is not differentiable and has therefore no effect on the training upgrades. This was my next attempt:

penalized_cross_entropy(y_true, y_pred, garbage_id=0, weight=1.0):
    ngs = (1 - y_pred[:, garbage_id]) # non garbage score (sum of scores of all non-garbage classes)
    penalty = y_true[:, garbage_id] * ngs / (1.-ngs)
    return K.categorical_crossentropy(y_true, y_pred) + weight * penalty

Here the combined score of all non-garbage classes are added for all samples of the minibatch that are false positives. For samples that are not false positives, the penalty is 0.

I tested the implementation on mnist with a small feedforward network and sgd optimizer using class "5" as "garbage":

  • With just the crossentropy the accuracy is around 0.9343 and the "false positive rate" (class "5" images recognized as something else) is 0.0093.

  • With the penalized cross entropy (weight 3.0) the accuracy is 0.9378 and the false positive rate is 0.0016

So apparently this works, however I am not sure if its the best approach. Also the adam optimizer does not work well with this loss function, thats why I had to use sgd.

Johannes
  • 3,300
  • 2
  • 20
  • 35
  • 1
    Does this implementation really work? Some of the operations that you are using seem to be non-differentiable (e.g. argmax) – rvinas Jun 27 '19 at 20:10
  • Have you considered the ```sample_weights``` argument of the ```fit()``` method ? That may be an alternative to defining your own loss. It seems to be that you want to multiple the error by a weight. Wouldn't the weight add op be a NOP when computing the gradients ? you want the weight affect the gradients. – Pedro Marques Jun 28 '19 at 07:07
  • Keras did not complain about the missing gradient. I suspect this is because the crossentropy has a valid gradient and the penalties "None" gradient might be ignored then... I edited the question. – Johannes Jul 02 '19 at 08:08

0 Answers0