I am doing a multilabel classification using some recurrent neural network structure. My question is about the loss function: my output will be vectors of true/false (1/0) values to indicate each label's class. Many resources said the Hamming loss is the appropriate objective. However, the Hamming loss has a problem in the gradient calculation: H = average (y_true XOR y_pred),the XOR cannot derive the gradient of the loss. So is there other loss functions for training multilabel classification? I've tried MSE and binary cross-entropy with individual sigmoid input.
Asked
Active
Viewed 2,765 times
1 Answers
5
H = average(y_true*(1-y_pred)+(1-y_true)*y_pred)
is a continuous approximation of the hamming loss.

Juan Wang
- 146
- 2
- 9
-
Hi Juan, thanks for your answering. In your approximation equation, I'm wondering if the y_true and y_pred are the probability or actual labels? – William Chou Jul 25 '17 at 18:54
-
y_true is the actual labels, and y_pred is the probability. – Juan Wang Jul 26 '17 at 19:11