0

I'm using Keras (Tensorflow backend) to build a model that given an input predicts a single class (out of 64 classes), a multiclass model. Given the pretty large number of classes, I do not want to use the categorical_crossentropy or the sparse_categorical_crossentropy loss functions, since they quite based on a single predicted class.

For example: let y_true=0 (sparse), y_pred1=[0.4, 0.5, 0.01, 0.02, 0.01, ...] and y_pred2=[0.04, 0.05, 0.5, 0.4, 0.01, ...] (where y_pred1 and y_pred2 are predicted one-hot vectors). The losses above will get the same loss - since the gold label 0 is not the highest label that was predicted. In my case, I do want to separate the predictions, and get the loss for y_pred1 much lower, since the gold label was at the top 5 predicted classes.

I tried to use a costume loss function, but with no success - unclear exception of ValueError: None values not supported. occured.

def in_top_k(y_true, y_pred):
    return keras.backend.in_top_k(y_pred, math_ops.argmax(y_true, axis=-1), k=5)

model.compile(loss=in_top_k,
              optimizer='adam',
              metrics=['accuracy'])

where y_pred and y_true are [n,64] tensors.

  • What could be that error?
  • Is my costume loss impl. correct?
  • Is my line of thinking correct?

Thanks!

Daniel Juravski
  • 181
  • 1
  • 2
  • 12
  • Is `y_true` one-hot or sparse? I really don't understand how you're doing that comparison of `y_true=0` with arrays of `y_pred` saying they're argmaxed. If they're argmaxed they're not an array anymore. I honestly don't understand what you're trying to achieve. – Daniel Möller Mar 29 '20 at 15:02
  • Hi @DanielMöller, I've edited the question a bit. `y_true` can be represented in both ways. I'll try to rephrase the issue in another way, I want to have a lower loss on `y_pred1` that almost predicted correctly the class 0 then the loss for `y_pred2` which the 0-class prediction seems to be far away. The losses that were mentioned above, will get to both predictions the same loss since the 0-class was not predicted. Thanks! – Daniel Juravski Mar 29 '20 at 16:07
  • I don't think this loss will help you. You're backed up by using "softmax" so the large number of classes will not be a problem. --- In the updated question, `y_pred1` and `y_pred2` will not have the same losses, `y_pred2` loss will be way bigger, because the class 0 is so much further from 1 than in `y_pred1`. ---- By using `'softmax'` as the last activation, you're linking all classes, it doesn't mather if you're looking at only one. If it's low, huge loss, grow it and this affects all other classes. If it's high, low loss, great. – Daniel Möller Mar 29 '20 at 17:33

0 Answers0