How to set loss weight in chainer?

Question

First of all I narrate you about my question and situation. I want to do multi-label classification in chainer and my class imbalance problem is very serious.

In this cases I must slice the vector inorder to calculate loss function, For example, In multi-label classification, ground truth label vector most elements is 0, only few of them is 1, In this situation, directly use F.sigmoid_cross_entropy to apply all the 0/1 elements may cause training not convergence, So I decide to use a[[xx,xxx,...,xxx]] slice( a is chainer.Variable output by last FC layer) to slice specific elements to calculate loss function. In this case, because of label imbalance may cause rare class low classification performance, so I want to set rare gt-label variable high loss weight during back propagation, but set major label(occur too many in gt) variable low weight during back propagation.

How should I do it? What is your suggestion about multi-label imbalance class problem training in chainer?

I could not find this feature for sigmoid_cross_entropy, but I can find `class_weight` at `softmax_cross_entropy` which do exactly what you want. https://docs.chainer.org/en/stable/reference/generated/chainer.functions.softmax_cross_entropy.html#chainer.functions.softmax_cross_entropy I guess you may also refer its implementation to adopt your situation for sigmoid_cross_entropy. https://github.com/chainer/chainer/blob/v3.0.0rc1/chainer/functions/loss/softmax_cross_entropy.py#L249-L253 — corochann, Oct 10 '17 at 00:13
In my knowledge, softmax_crossentropy is not fit for multi-label classification, because multi-label problem there is multible label in one image may be 1, but using softmax_crossentropy which means there is only one label may be True? — machen, Oct 10 '17 at 01:16
If softmax_cross_entropy is not fit for multi-label problem , I have another idea to deal with it, how about using F.tile(pred) to copy specific element(rare class prediction elements) multiple times, and calculate loss, during BP, the loss of rare class may be enhanced? (I don't know whether F.tile can do this?) — machen, Oct 10 '17 at 01:32

score 0 · Answer 1 · answered Oct 10 '17 at 00:15

0

If you work on multi-label classification, how about using softmax_crossentropy loss?

softmax_crossentropy can take into account the class imbalance by specifying the class_weight attribute. https://github.com/chainer/chainer/blob/v3.0.0rc1/chainer/functions/loss/softmax_cross_entropy.py#L57

https://docs.chainer.org/en/stable/reference/generated/chainer.functions.softmax_cross_entropy.html

answered Oct 10 '17 at 00:15

himkt

31
4

1

In my knowledge, softmax_crossentropy is not fit for multi-label classification, because multi-label problem there is multible label in one image may be 1, but using softmax_crossentropy which means there is only one may be True? – machen Oct 10 '17 at 01:15
If softmax_cross_entropy is not fit for multi-label problem , I have another idea to deal with it, how about using F.tile(pred) to copy specific element(rare class prediction elements) multiple times, and calculate loss, during BP, the loss of rare class may be enhanced? (I don't know whether F.tile can do this?) – machen Oct 10 '17 at 01:32
You are right, I'm sorry I'm confused multi-label with multi-class. – himkt Oct 10 '17 at 03:16

score 0 · Accepted Answer · answered Oct 10 '17 at 01:54

You can use sigmoid_cross_entropy() of no-reduce mode (by passing reduce='no') to obtain a loss value at each spatial location and the average function for weighted averaging.

sigmoid_cross_entropy() first computes the loss value at each spatial location and each data along the batch dimension, and then take the mean or summation over the spatial dimensions and batch dimension (depending on the normalize option). You can disable the reduction part by passing reduce='no'. If you want to do the weighted average, you should specify it so that you can get the loss value at each location and reduce them by yourself.

After that, the simplest way to manually do weighted averaging is using average(), which can accept weight argument that indicates the weights for averaging. It first does weighted summation using the input and weight, and then divides the result by the summation of weight. You can pass appropriate weight array that has the same shape as the input and pass it to average() along with the raw (unreduced) loss values obtained by sigmoid_cross_entropy(..., reduce='no'). It is also ok to manually multiply a weight array and take summation like F.sum(score * weight) if weight is appropriately scaled (e.g. summing up to 1).

Thank you a lot, you are truly help me a lot, I think may be which weight is most appropriate may depend on my a lot of experiment( because the black box in deep learning is too black). The tuning parameter procedure is Uncomfortable , which means I spend a lot of days observing which parameter is turning the loss and accuracy best. If chainer has an visualize and administrator manage tool that would be nice for us to tune parameter. — machen, Oct 10 '17 at 13:23

How to set loss weight in chainer?

2 Answers2