0

I know there are many questions treating custom loss functions in Keras but I've been unable to answer this even after 3 hours of googling.

Here is a very simplified example of my problem. I realize this example is pointless but I provide it for simplicity, I obviously need to implement something more complicated.

from keras.backend import binary_crossentropy
from keras.backend import mean
def custom_loss(y_true, y_pred):

    zeros = tf.zeros_like(y_true)
    index_of_zeros = tf.where(tf.equal(zeros, y_true))
    ones = tf.ones_like(y_true)
    index_of_ones = tf.where(tf.equal(ones, y_true))

    zero = tf.gather(y_pred, index_of_zeros)
    one = tf.gather(y_pred, index_of_ones)

    loss_0 = binary_crossentropy(tf.zeros_like(zero), zero)
    loss_1 = binary_crossentropy(tf.ones_like(one), one)

    return mean(tf.concat([loss_0, loss_1], axis=0))

I do not understand why training the network with the above loss function on a two class dataset does not yield the same result as training with the built in binary-crossentropy loss function. Thank you!

EDIT: I edited the code snippet to include the mean as per comments below. I still get the same behavior however.

zii
  • 35
  • 6
  • How is the result different? Complete different, or not? – sdcbr Jan 25 '19 at 19:03
  • Different accuracy, different predictions? – mickey Jan 25 '19 at 19:04
  • Yes, it goes from 83% acc. using the built-in function to 55% acc. using my function (same random seeds) – zii Jan 25 '19 at 19:07
  • @mickey Both, different accuracy and predictions. – zii Jan 25 '19 at 19:08
  • `tf.concat` just concatenates the two objects, you need to combine them somehow (maybe with `tf.mean`?) so you get the correct loss. – mickey Jan 25 '19 at 19:09
  • So the output of the loss function is a scalar? Or a tensor of the same shape as `y_true`? I am confused because this [binary cross entropy loss]( https://www.tensorflow.org/api_docs/python/tf/keras/backend/binary_crossentropy) uses this [funciton](https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits) which returns the same shape as the input. – zii Jan 25 '19 at 19:15
  • @mickey I just tried `tf.reduce_mean`, which does not produce an error, but again the same 55% accuracy. – zii Jan 25 '19 at 19:19
  • Try `metrics = ['binary_accuracy']` in the compile statement. My experience has been that using a custom loss function (especially for binary classification) that the accuracy function gets messed up, and changing `metrics` has proven useful. – mickey Jan 25 '19 at 19:32
  • @mickey I still get the same 55%. I would imagine if the predictions are different then there's something fundamentally wrong. I am plotting the decision boundary and it's totally messed up when I use the custom loss. – zii Jan 25 '19 at 19:36
  • @zii Agreed, I suspect the calculation of the loss is just wrong. Perhaps you need to take the means of `loss_0` and `loss_1` separately and them add them together? Taking the straight mean of the two wouldn't account for any class imbalance, not matter how small. – mickey Jan 25 '19 at 20:06
  • @mickey I tried that too but it doesn't work. And yes the loss functions do not match but I really have no idea why. – zii Jan 26 '19 at 00:32

1 Answers1

0

I finally figured it out. The tf.where function behaves very differently when the shape is "unknown". To fix the snippet above simply insert the following lines right after the function is declared:

y_pred = tf.reshape(y_pred, [-1])
y_true = tf.reshape(y_true, [-1])
zii
  • 35
  • 6