0

This question follows Does slice or index of chainer.Variable to get item in chainer has backward ability? Consider an typical example question : suppose I have convolutional layer + FC layer, my last FC layer output an vector.

Because in some cases I must slice the vector to calculate loss function, For example, In multi-label classification, ground truth label vector most elements is 0, only few of them is 1, In this situation, directly use F.sigmoid_cross_entropy may cause label imbalance problem, So I decide to use a[0, 1]( a is chainer.Variable output by last FC layer) to slice specific elements to calculate loss function.

In this situation, How does the last FC layer to gradient flow(BP), how does it to update its weight matrix??

machen
  • 283
  • 2
  • 10

1 Answers1

0

When you write b = a[index] for Variable a and slices index (might be fancy indexing), backpropagating through this operation sets values of b.grad to a.grad[index], leaving other elements of a.grad zero (because the corresponding elements of a do not affect the loss value). The backprop of the last FC layer then computes the gradients w.r.t. the weight matrix and bias vector as usual with this a.grad.

Seiya Tokui
  • 341
  • 2
  • 3
  • I do an experiment, I convert mnist dataset's single label problem to multi-label problem, then use chainer.Variable slice method to control the positve/negative = 1/3 as I said in the question, Then use F.sigmoid_cross_entropy. I found this approach train accuarcy increase is much slower than I directly use F.softmax_cross_entropy without variable slice. Does this because this approach most of grad =0 each iteration, so training slow. – machen Sep 08 '17 at 04:20
  • if w and x shape are different, thus dw and dx shape are different, how can we element-wise top-layer's dx multiply by the previous-layer's dw that affect the change of w of this previous layer. – machen Oct 27 '17 at 01:29