1

If the answer is yes, and is this true that arbitrary ops/functions/methods could be backpropagated through? Furthermore,for example, what's the gradient of scalar_y w.r.t. tensor_x in below code:

step1: scalar_y = tf.size(tensor_x) OR scalar_y = paddle.fluid.layers.size(tensor_x)

step2: tensor_z = scalar_y * tensor_x

If the answer is no, and then under what circumstances or in which (types of) ops will this happens, and why?


One more concrete example(one implementation of DropBlock) : Line 34~37 in https://github.com/DHZS/tf-dropblock/blob/master/nets/dropblock.py

My question is whether it is necessary to wrap all the variables below in tf.stop_gradient:

output = inputs * mask * tf.to_float(tf.size(mask)) / tf.reduce_sum(mask)

Anybody help?

Victor Li
  • 43
  • 5

0 Answers0