I couldn't find a tensorflow built-in that allows you to pass in labels which don't sum to 1, so tried writing my own: (Input is [batch_size,labels])
tf.reduce_mean(tf.reduce_sum(y_true,axis=1) * tf.reduce_logsumexp(y_pred_logits,axis=1)
- tf.reduce_sum(y_true * y_pred_logits,axis=1))
However it doesn't seem to be working (loss is diverging). Did I do something wrong?