0

I have come across with a problem when started implementing stochastic depth regularization approch using Tensorflow. The paper (https://arxiv.org/pdf/1603.09382.pdf) states that the model can converge faster if we drop randomly some residual units during training. Current Torch implementation works perfectly. In Tensoflow, I can put conditions on residual unit branches, so that during forward step the activations for it will be cancelled, but the weights still will be updated during backward step. There is no way to tell that these weights (in the residual branch which we cancelled) are no longer trainable and they should not be included in optimization for current session run.

I have created the issue on github, where I covered how this problem can be solved in naive way, of course there is something under-hood that will prevent applying an easy fix, otherwise it is really strange why the tf.Variable's trainable parameter does not allow boolean Tensor as a value. If someone has clue for this question, I would really appreciate if you restore my faith in Tensoflow :)

Artem Artemev
  • 516
  • 3
  • 8

1 Answers1

1

The trainable parameter is used to control whether a graph to train that variable is built or not. Using a conditional stopgradient (a tf.cond with tf.identity in one branch and tf.stopgradient on the other) will deal with stopping the gradient from that variable.

However, if its value was not used during a forward step the gradient computed is guaranteed to be 0, and hence the update will be a no-op.

Alexandre Passos
  • 5,186
  • 1
  • 14
  • 19
  • Thank you for the comment. Yes, you are right in some sense. Despite the fact that we will not include computation of gradients in the graph, we still can't prevent applying of gradients to parameters, because they are in the list of trainable variables. And that's a main point. ps: just for sake of making it compliant with docs `tf.stopgradient` should be `tf.stop_gradient`. – Artem Artemev Mar 31 '17 at 11:32