I have come across with a problem when started implementing stochastic depth regularization
approch using Tensorflow. The paper (https://arxiv.org/pdf/1603.09382.pdf) states that the model can converge faster if we drop randomly some residual units during training. Current Torch implementation works perfectly. In Tensoflow, I can put conditions on residual unit branches, so that during forward step the activations for it will be cancelled, but the weights still will be updated during backward step. There is no way to tell that these weights (in the residual branch which we cancelled) are no longer trainable and they should not be included in optimization for current session run.
I have created the issue on github, where I covered how this problem can be solved in naive way, of course there is something under-hood that will prevent applying an easy fix, otherwise it is really strange why the tf.Variable
's trainable parameter does not allow boolean Tensor as a value. If someone has clue for this question, I would really appreciate if you restore my faith in Tensoflow :)