0

Suppose I have pretrained the network on other database. Because of overfitting problem, my database is not very diversity , so the overfitting problem is very serious. I want to load the pretrained network parameter in chainer v2.0 but to fix first several layers, in this case, what should I use in chainer v2.0, I know in chainer1.0 there is volatile keyword but which is deprecated in v2.0.

Should I use with chainer.no_backprop_mode(): inside def __call__ when process in first several layers?

machen
  • 283
  • 2
  • 10

1 Answers1

0

Yes, you can use chainer.no_backprop_mode() context manager in your forward computation code to fix parameters of specific layers. Here is an example:

def __call__(self, x):
    with chainer.no_backprop_mode():
        h1 = F.relu(self.l1(x))
    h2 = F.relu(self.l2(h1))
    return self.l3(h2)
kmaehashi
  • 879
  • 6
  • 10
  • I asked a chainer company person, he said: If you are using Chainer v2 or later, you can set `param.update_rule.enabled = False`, with which the optimizer will not update the parameter. ? which is best solution? – machen Nov 05 '17 at 13:40
  • `no_backprop_mode` makes backprop skip the gradient computation for the layers, but it od – Seiya Tokui Nov 05 '17 at 15:07
  • (The above comment is a mistake; please ignore) Note that "no backprop mode" just skips the gradient computation which results in zero gradients for `l1`. If you are using, e.g., MomentumSGD which was used to optimize `l1` (for pretraining) and you did not clean the "state" of the optimizer (in this case, the accumulated gradients of past iterations), this optimizer will change the parameter even if the gradient is zero. If you are using a fresh new optimizer for finetuning (or using SGD, which is stateless), the parameter will be kept unchanged by the no backprop mode. – Seiya Tokui Nov 05 '17 at 15:22
  • Do you mean in "chainer.no_backprop_mode()" context, there is still have the chance to update parameter? e.g. MomentumSGD??? My question is very simple, how to fix some layer's parameter? is "chainer.no_backprop_mode()" enough? – machen Nov 27 '17 at 12:50
  • Do you mean we must use "param.update_rule.enabled = False" to fix some layers' parameter? How to use that ? do you have some example code? – machen Nov 27 '17 at 12:51