I have a classification scenario with more than 10 classes where one class is a dedicated "garbage" class. With a CNN I currently reach accuracies around 96%, which is good enough for me.
In this particular application false positives (recognizing "garbage" as any non-garbage class) are a lot worse than confusions between the non-garbage classes or false negatives (recognizing any non-garbage class instead of "garbage"). To reduce these false positives I am looking for a suitable loss function.
My first idea was to use the categorical crossentropy and add a penalty value whenever a false positive is detected: (pseudocode)
loss = categorical_crossentropy(y_true, y_pred) + weight * penalty
penalty = 1 if (y_true == "garbage" and y_pred != "garbage") else 0
My Keras implementation is:
def penalized_cross_entropy(y_true, y_pred, garbage_id=0, weight=1.0):
ref_is_garbage = K.equal(K.argmax(y_true), garbage_id)
hyp_not_garbage = K.not_equal(K.argmax(y_pred), garbage_id)
penalty_ind = K.all(K.stack([ref_is_garbage, hyp_not_garbage], axis=0), axis=0) # logical and
penalty = K.cast(penalty_ind, dtype='float32')
return K.categorical_crossentropy(y_true, y_pred) + weight * penalty
I tried different values for weight
but I was not able to reduce the false positives. For small values the penalty has no effect at all (as expected) and for very large values (e.g. weight = 50
) the network only ever recognizes a single class.
Is my approach complete nonsense or should that in theory work? (Its my first time working with a non-standard loss function).
Are there other/better ways to penalize such false positive errors? Sadly, most articles focus on binary classification and I could not find much for a multiclass case.
Edit:
As stated in the comments the penalty from above is not differentiable and has therefore no effect on the training upgrades. This was my next attempt:
penalized_cross_entropy(y_true, y_pred, garbage_id=0, weight=1.0):
ngs = (1 - y_pred[:, garbage_id]) # non garbage score (sum of scores of all non-garbage classes)
penalty = y_true[:, garbage_id] * ngs / (1.-ngs)
return K.categorical_crossentropy(y_true, y_pred) + weight * penalty
Here the combined score of all non-garbage classes are added for all samples of the minibatch that are false positives. For samples that are not false positives, the penalty is 0.
I tested the implementation on mnist with a small feedforward network and sgd optimizer using class "5" as "garbage":
With just the crossentropy the accuracy is around 0.9343 and the "false positive rate" (class "5" images recognized as something else) is 0.0093.
With the penalized cross entropy (weight 3.0) the accuracy is 0.9378 and the false positive rate is 0.0016
So apparently this works, however I am not sure if its the best approach. Also the adam optimizer does not work well with this loss function, thats why I had to use sgd.