Tensorflow/Keras: Cost function that penalizes specific errors/confusions

Question

I have a classification scenario with more than 10 classes where one class is a dedicated "garbage" class. With a CNN I currently reach accuracies around 96%, which is good enough for me.

In this particular application false positives (recognizing "garbage" as any non-garbage class) are a lot worse than confusions between the non-garbage classes or false negatives (recognizing any non-garbage class instead of "garbage"). To reduce these false positives I am looking for a suitable loss function.

My first idea was to use the categorical crossentropy and add a penalty value whenever a false positive is detected: (pseudocode)

loss = categorical_crossentropy(y_true, y_pred) + weight * penalty
penalty = 1 if (y_true == "garbage" and y_pred != "garbage") else 0

My Keras implementation is:

def penalized_cross_entropy(y_true, y_pred, garbage_id=0, weight=1.0):
    ref_is_garbage = K.equal(K.argmax(y_true), garbage_id)
    hyp_not_garbage = K.not_equal(K.argmax(y_pred), garbage_id)
    penalty_ind = K.all(K.stack([ref_is_garbage, hyp_not_garbage], axis=0), axis=0) # logical and
    penalty = K.cast(penalty_ind, dtype='float32')
    return K.categorical_crossentropy(y_true, y_pred) + weight * penalty

I tried different values for weight but I was not able to reduce the false positives. For small values the penalty has no effect at all (as expected) and for very large values (e.g. weight = 50) the network only ever recognizes a single class.

Is my approach complete nonsense or should that in theory work? (Its my first time working with a non-standard loss function).
Are there other/better ways to penalize such false positive errors? Sadly, most articles focus on binary classification and I could not find much for a multiclass case.

Edit:

As stated in the comments the penalty from above is not differentiable and has therefore no effect on the training upgrades. This was my next attempt:

penalized_cross_entropy(y_true, y_pred, garbage_id=0, weight=1.0):
    ngs = (1 - y_pred[:, garbage_id]) # non garbage score (sum of scores of all non-garbage classes)
    penalty = y_true[:, garbage_id] * ngs / (1.-ngs)
    return K.categorical_crossentropy(y_true, y_pred) + weight * penalty

Here the combined score of all non-garbage classes are added for all samples of the minibatch that are false positives. For samples that are not false positives, the penalty is 0.

I tested the implementation on mnist with a small feedforward network and sgd optimizer using class "5" as "garbage":

With just the crossentropy the accuracy is around 0.9343 and the "false positive rate" (class "5" images recognized as something else) is 0.0093.
With the penalized cross entropy (weight 3.0) the accuracy is 0.9378 and the false positive rate is 0.0016

So apparently this works, however I am not sure if its the best approach. Also the adam optimizer does not work well with this loss function, thats why I had to use sgd.

Does this implementation really work? Some of the operations that you are using seem to be non-differentiable (e.g. argmax) — rvinas, Jun 27 '19 at 20:10
Have you considered the ```sample_weights``` argument of the ```fit()``` method ? That may be an alternative to defining your own loss. It seems to be that you want to multiple the error by a weight. Wouldn't the weight add op be a NOP when computing the gradients ? you want the weight affect the gradients. — Pedro Marques, Jun 28 '19 at 07:07
Keras did not complain about the missing gradient. I suspect this is because the crossentropy has a valid gradient and the penalties "None" gradient might be ignored then... I edited the question. — Johannes, Jul 02 '19 at 08:08

Tensorflow/Keras: Cost function that penalizes specific errors/confusions

0 Answers0